The ThoroughMetrics Blog: May 2008

Saturday, May 31, 2008

First Report Complete

If you don't want to read any shameless self promotion, feel free to skip this blog entry. That said, my first full research report is complete and available for purchase. It compares the racing results of horses purchased at two year old auctions (specifically Fasig-Tipton Selected Two Year Olds in Training at Calder) with the results of those purchased at yearling auctions (Fasig-Tipton Selected Yearlings at Saratoga). The results are a real eye-opener...if you're spending money at the wrong type of auction for the price range of horse you're buying, you're potentially wasting a lot of money, and could achieve much better results in the long run by limiting yourself to the right type of auction. I'll be putting some information about the study up on my business site ThoroughMetrics in the next few days, but for now, if you'd like to learn more about the study, send me an email at zelvin30@hotmail.com.

Sometime in the next few days, I'll start on my next study. I'm tentatively planning to look at which is a better predictor of success for the offspring of a mare...the mare's racing ability, or the success of her previous offspring. I haven't thought through all the details yet, but I picture the following high level steps:
1. Get a fairly random list of horses foaled in the same year.
2. For each horse, record some measure of their racing success (probably SSI), sire, sire's stud fee in year conceived, dam, dam's SSI, # of previous foals for dam, median SSI of dam's previous children, and mean SSI of dam's previous children.
3. Calculate correlations between horse's SSI and dam's SSI, horse's SSI and dam's median offspring SSI, and dam's mean offspring SSI.
4. Divide horses into tiers based on stallion's stud fee and dam's SSI/dam's median offspring SSI/dam's mean offspring SSI and compare horses' SSI in each category. This should control for the fact that dam's of one 'type' (good racer, good breeder) may tend to be sent to better stallions.

The thing that makes this interesting to me (other than the possibly utility of the results) is that there's a tradeoff here. The success of a dam's previous offspring is obviously a more direct measure of what we're looking for - the ability to pass on racing ability to children. However, it's also something that has VERY high variance. So many great racehorses have had half (and full) siblings who were duds on the track. The success of a mare during her racing career is a much more indirect measure of what we're looking for...but it's also much lower variance. Despite the equally small sample size, the results of a horse's races are much more consistent than the performance of their children.

If I had to guess, I'd expect the racing ability to have slightly more influence the the success of previous offspring. But the beauty of statistical research is that I don't have to guess...I can actually measure it.

I'd love to hear people's thoughts on the idea...is it a useful study? Are there any obvious flaws in the high level study design that I've described?

Tuesday, May 27, 2008

Unique Opportunity

I think there are two approaches that can be profitable in any type of speculation, whether it's as a thoroughbred owner, stock market investor, or gambler. My usual preference (and the approach my Thoroughmetrics business uses) is look for confirmed 'edges' that can be applied many times, leading to a virtual certainty of long term profit. This requires situations that come up often enough that statistically valid back-testing can be done to test theories, and that the edge can be applied frequently enough in the future to be worth something.

The other approach is to be opportunistic, and wait until a unique situation presents itself that has so many factors in its favor, that your experience and intuition tell you that you have an immense edge. These situations can't be statistically validated, because they occur so infrequently, and each one is relatively unique. The key to these is to be patient and impartial when evaluating them. You should also be able to articulate in clear and specific terms your reasons for thinking that they represent a profitable opportunity...otherwise it's likely that you're fooling yourself, and just WANT to find such an opportunity.

I believe that there's a chance that one of these opportunities may present itself in the Belmont Stakes this year.

Most people seem to expect Big Brown to go off as about a 1-5 favorite. I think most would agree that he actually deserves to be somewhere around a 1-2 favorite. He's beaten most of his opponents, won at a longer distance than any of them, he's unbeaten, run faster Beyers than his opponents, and run a much faster Ragozin than any of them. While there are several good horses in the race, none of them are proven stars yet. His strongest opponent (Casino Drive) has only raced twice, and won a Grade 2 race by 5 lengths with a Beyer of 101. He has never run longer than a mile and an eighths. Yes, he looked good in the Peter Pan, yes his pedigree suggests he should love the distance of the Belmont, and yes Bernardini had similar credentials before the 2006 Preakness...but I believe that fair odds on him would be around 5-1. His next strongest opponent (Denis of Cork) finished around 10 lengths back in the Kentucky Derby. Yes, he made a big comeback from last place in that race, but that running style tends to be overrated for the Belmont Stakes. He also probably deserves to be about 5-1 or 6-1. Assuming everyone elses odds should be substantially higher, I believe 1-2 is about right for Big Brown.

So is the opportunity I'm looking at to bet against Big Brown, since people expect him to be at far lower odds?

I don't think so. I believe there's a chance that he may go off at much higher odds than expected. Possibly even money or worse. Here's why:

1. The Japanese have showed up in large numbers for overseas races in the past, and bet their horses down to ridiculously low odds. I'm not sure that will happen for Casino Drive since he wasn't an established star in Japan, but if it does happen it could have a huge impact on the betting.

2. Many pedigree geeks seem to be overrating the influence of Casino Drive's pedigree. Yes, he almost certainly can handle the mile and a half distance of the Belmont, but there's not some mystical ability that Better Than Honour has passed on to her children to automatically win the race.

3. The ten previous failures since the last Triple Crown winner seem to have convinced a lot of people that it's become almost impossible. Instead of understanding that the near misses by horses like Real Quiet and Silver Charm showed that it CAN still be done, people are interpreting the bad luck and near misses as showing that it CAN'T, and some people will bet against Big Brown simply because of this.

4. Big Brown's quarter cracks are going to scare off some more potential support at the windows, despite the fact that many horses have run with worse, with little or no impact on their performance.

In summary, I think there's a real possibility that Big Brown could go off at around even money or slightly higher. And if so, I think it's a fabulous opportunity to back a likely winner at generous odds.

Friday, May 23, 2008

Searching For Hidden Gold

I’d like to get some suggestions from readers on a study I’ll be starting on in a few weeks. Basically I’ll be looking for factors that influence the likelihood of a claiming horse eventually competing successfully in allowance or stakes races. If you have any suggestions for factors to look at in the study, please let me know!

Some of the factors I’m already planning to study include:
1. Age – I assume a younger horse has a better chance of substantial improvement than an older horse. I’d like to measure just how much better.
2. Experience – I assume the fewer previous races a horse has had, the better chance that it has some hidden abilities. I might also try to factor in its experience on a variety of surfaces and tracks.
3. Sex – The sex of a horse could affect the likelihood of it being put into claimers despite having the potential for improvement.
4. Trainer – I’d actually prefer to look at owner/trainer combinations, except that I think most wouldn’t have large enough sample sizes to learn much of value. I’d guess that some trainers aren’t as good at recognizing a horse’s future potential than others, and that some may enter their horses more aggressively at the lower claiming levels in order to try to win races.
5. Breeding – I’d guess that the higher the sire’s stud fee, the more of a dud the owner and trainer have to consider the horse before they’ll enter them in mid and low level claiming races.

Anybody have any other suggestions?

Wednesday, May 21, 2008

Predicting the Past

I’ve mentioned before that I think most research done on the thoroughbred industry doesn’t use correct research methods and statistical analysis. One of the really common errors I’ve noticed is studies that don’t actually evaluate the predictive value of their findings. Theories are tailored to produce the best possible fit with what’s happened in the past, but it is just assumed that they will predict the future equally well.

There are at least three things we can do to avoid drawing false conclusions of this sort:

1. Apply common sense. While the basic premise of this blog and my Thoroughmetrics research business is that relying on common sense isn’t enough, that doesn’t mean we should do without it entirely. For an extreme example, no matter what the data tell us, we wouldn’t put any faith in a theory that say “dams whose names begin with the letter S outperform all other letters”. While that may accurately describe what has happened in the past, there’s no logical reason to expect that to help us accurately predict the future.
2. Look for ‘smooth’ patterns in the results. For example, before the Florida Derby, there was a lot of talk about how Big Brown’s outside post position would make it tougher for him to win the race. Horses in the outside positions in routes run at Gulfstream have very low winning percentages. Some people discounted this with the argument that ‘horses in the 4 slot also have a low winning percentage’. That’s nonsense. If 1, 2, 3, 5, 6, and 7 have solid winning percentages, while 4, 8, 9, 10, 11, and 12 have low winning percentages (with 11 and 12 having no winners prior to Big Brown), it’s a safe bet that post position 4’s low winning percentage is a fluke, since 2, 3, 5, and 6 do just fine. On the other hand, the outer post positions impose a real handicap on a horse’s chance of winning…none of them have yielded good results.
3. Test multiple, independent samples to confirm the results. For example, if you’re looking at a variable that you think will best identify the top sires, don’t just lump all the data together. Try looking at it for individual years, and seeing how consistent the results are from one year to the next. You may find that rankings that you thought had real predicative value, simple describe what has already happened, rather than predict what will happen. This is particularly true with variables that can be thrown off by a small number of aberrations. Because of Big Brown, Boundary is likely to be one of this year’s leading sires based on total earnings of offspring. I hope nobody will now consider Boundary a top sire (forgetting for the moment that he’s been pensioned already).

Saturday, May 17, 2008

Inefficient Markets

Stock market experts have argued for the past four decades about whether the stock market is efficient or not. Those who argued for efficiency were basically saying that future performance of stocks was essentially random, because as soon as some mispricing or inefficiency offered a profit opportunity, someone, somewhere would take advantage of it, and in the process eliminate the opportunity for others. Those who argued against it pointed to the performance records of some of the world's greatest investors, compiled over many years, and how unlikely that level of performance would be if it was truly impossible to have an edge. More recently, both camps have generally adopted the more moderate (and correct) opinion that the stock is very (but not completely) efficient. Opportunities for outperforming the market as a whole do exist, but they're VERY hard to find, because you essentially need to be the only one to identify them to profit from them, and the patterns that work now will be discovered and eliminated, requiring constant adjustments to strategy.

Whenever I'm evaluating an opportunity for speculation (and I'd include both investing and handicapping in this category), I think about whether I'm operating in an efficient 'market', and what that means for me.

In my opinion, the reason the stock market is so efficient (and tough to 'beat') is that there are an unlimited number of participants (so there's a greater chance that somebody is going to find each profit opportunity) and participants can 'bet' as much as they want (so just one or two people finding an inefficiency can completely eliminate it).

The same situation exists with parimutuel betting. Anyone can participate (even more true with the advent of online betting), and they can bet as much as they want. This means that (like the stock market) you really need to be one of the very best handicappers to come out ahead. That's particularly true when you add in the fact that handicapping (unlike the stock market generally) is a negative sum game...overall the payouts are less than the total amount invested.

Ownership of horses does not necessarily operating as efficiently. While in theory, anybody can participate, not everyone does. At an auction, or when evaluating a horse to claim, you may really only be competing against a few hundred other people. If you're operating at the high end of the market, the number may be even lower. How many people really are involved in the bidding for Big Brown's stud future right now? 2? 3? Maybe 4? Only one of those has to overvalue him for IEAH to hit the jackpot. The less efficient a market is, the more important it is to know who you're competing against, and the more profit opportunities you'll find if you're good, but not the best at something. As an owner, you should be looking for opportunities like those found at a poker table or in a fantasy baseball league, where you're competing against the same 8 or 9 people over and over, and if you're the best, you're generally going to come out ahead in the long run.

This might mean playing the claiming game at a smaller track with less claiming stables, it might mean hording a top stallion's offspring because you know there's a buyer willing to consistenly overpay for them, or it might mean privately purchasing horses so that you're in situations where you're competing one on one with an unsophisticated seller.

Wednesday, May 14, 2008

Claiming Races

While the premise behind Thoroughmetrics is that the thoroughbred industry is short on really good statistical analysis and statistically valid research, that's not entirely true. The problem is that the vast majority of the quality research has focused on what handicappers need to know, not what owners and breeders need to know. A perfect example that occurred to me is what kind of research into trainer or trainer/owner performance in claiming races would be most useful. Handicappers are most interested in what happens while a horse is under the care of a trainer. What patterns does the trainer have success with? Do the horses the trainer claims improve while under his care? Owners on the other hand, should be more interested in what happens AFTER horses get claimed from a trainer or owner. Were they trained especially hard, causing ongoing health problems? Is a drop in class from that trainer (or owner) a warning sign that something is wrong and the stable is trying to unload the horse? If I can get my hands the right data (which, it appears, is going to be a recurring theme for me) I'd like to study whether there are stables whose horses should never be claimed, or should be avoided in certain situations (for example, a sharp drop in class coming off an ok performance).

Monday, May 12, 2008

Turning An Idea Into A Study - Part 2

Actually the other three issues I identified that need to be taken into account in designing my study looking at which sires produce sound offspring are relatively simple to address. As a reminder, here they are:

2. Was sample size sufficient to draw meaningful conclusions (the poster actually mentioned this one).

3. Were results being skewed by the fact that some horses in the data might not have finished their racing careers yet? If so, that would have more impact on the data for younger sires with fewer crops who had copleted their careers.

4. Were unraced horses included in the data? In most cases, the reason for not racing would presumably be lack of soundness.

Sample size can be addressed easily...just include a relatively full set of data. I'm mostly interested in sires that have been successful to some degree, so all of them will have had many children in each crop. I'll look at all of those children, unless data is hard to find for some (like those who raced overseas).

The issue of data being skewed by horses who haven't finished their careers yet will not affect the study as I'm approaching it, since we're looking at year by year results, rather than 'full career' result. Yet another advantage to doing it this way!

I definitely want the data to reflect horses that never raced. Other people have shown this data (% starters for the sire), but I think it's an important part of the data on soundness, and I'll include it too.

Friday, May 9, 2008

Turning An Idea Into A Study - Part 1

In my last post I talked about one approach to determining which sires tend to produce sound offspring, and some potential pitfalls to studying the the topic using that approach. Here I'll discuss some steps that can be taken to ensure that a study that uses this approach eliminates the first (and most complex) of the biases introduced by those issues:

Offspring of better sires will tend to retire earlier due to breeding value.

One thing to keep in mind is that this will affect males and females differently. The effect will be strongest among males, but only those with enough ability (or good enough breeding) to be sent to the breeding shed. In females, the effect will be weaker, but will affect virtually all of them. So we can't fully solve the problem by stuying just one sex or the other, but perhaps by limiting the study to males who are not ultimately used for stud duty. Another option would be to look at geldings only...but then we run into sample size issues...particularly for the top sires, where owners will be very reluctant to geld their sons.

One possible approach to eliminating this as a factor would be to look at average starts per year of racing, instead of average starts per career. If we did that, we'd need to make sure to control for age in some way though, since horses will tend to race more in the prime of their career than as two year olds. One bias in the data we should be aware of here is that cheaper horses will tend to race more, regardless of soundness. So the pampered children of star sires like AP Indy may come out looking less sound than they really are.

Next post, I'll start looking at the other three issues I brought up regarding this research topic.

Thursday, May 8, 2008

Producing Sound Offspring

While I truly believe that statistical analysis can ultimtely answer many questions about what factors impact breeding and racing results, I also believe that most of the answeres are not as straightforward as they seem.

As an example, someone on one of the discussion boards I visit recently asked, "Which sires consistently produce sound offspring?" In response, someone else posted a list of some well known sires, along with the average numbers of starts for their offspring. As a general approach, this seems reasonable. And it's far better than going based purely on reputation. That said, none of the following potential sources of bias were addressed:

1. Were the offspring of better sires retired earlier because of breeding value, rather than unsoundness?

2. Was sample size sufficient to draw meaningful conclusions (the poster actually mentioned this one).

3. Were results being skewed by the fact that some horses in the data might not have finished their racing careers yet? If so, that would have more impact on the data for younger sires with fewer crops who had copleted their careers.

4. Were unraced horses included in the data? In most cases, the reason for not racing would presumably be lack of soundness.

In light of the common opinion that horses today are being bred for speed rather than soundness, the question of which sires pass on soundness is a good one. And the approach of looking at starts per offspring is valid. But a lot more than that goes into doing statistically valid research. Next post, I'll talk a little about how a study could be designed that would eliminate or reduce the impact of each of the four problems I discussed above.

Wednesday, May 7, 2008

ThoroughMetrics

One of the inspirations for starting ThoroughMetrics was my belief that well designed statistical research can help performance in almost any area of study. One of the most well known (and well studied) subjects is baseball. The statistical study of baseball (popularly known as sabremetrics) was popularized by Bill James, and became known more widely due to the popularity of Moneyball, which told the story of how a team with limited resources was able to use better statistical analysis as a competitive advantage. While many people have read Moneyball, and agree with it's conclusions, I think most people take too narrow a view of it's lessons. It isn't about specific strategies for building a competitive baseball team...it's about how objective analysis can provide an edge over those who stick to common sense, intuition, and statistically invalid, biased research. It's a lesson that can be applied profitably to anything where data can be collected and results depend on the accuracy of forecasts or predictions.