The ThoroughMetrics Blog: November 2008

Wednesday, November 26, 2008

Or Maybe Not...

Argh! The calculations I discussed in my last post may have been based on some faulty assumptions about how sire statistics are reported. Can anybody answer the following question...When BRIS presents statistics on 'leading second crop sires', are they including data only on the horses that are actually part of the sire's second crop, or data on both the first and second crop children of sires whose second crop are currently racing? I had assumed that it's the former, but after looking over the data some more I'm thinking it must be the latter since it doesn't seem like ANY horses have more starters and winners in their 'second crop' year than their 'first crop' year. If so, that's incredibly frustrating and is almost certainly going to mean that research in this area will involve paying for a substantial amount of data and doing a lot more work than I had anticipated. Why does it always seem like BRIS and Equineline go out of their way to present their data in a way that makes it useless for any serious research?

Finally Getting Somewhere

As my regular readers (all four of you) know, one of the research projects I'd like to tackle is an evaluation of the existing measures of breeding success, and ultimately coming up with some statistics that do a better job of predicting future success. As I mentioned yesterday, the main challenge has been finding the data I need to study it. This morning I had a very minor breakthrough. Although BRIS is the data provider for The Bloodhorse, they don't always present the data in the same format. It turns out that BRIS has a version of the leading freshman, leading second crop, and leading third crop sires where they include SPI. As far as I can tell, SPI appears to be the same thing as AEI, although I haven't yet checked the numbers to see that the calculations are being done exactly the same.

I took list of leading first crop sires of 2007 and the list of leading second crop sires of 2008, both ranked by SPI. Of the 75 names on each list, there were 59 in common. The correlation between the SPIs was .65. Removing the horses with very few starts (less than 20 starts in the 2007 list and less than 30 starts in the 2008 list) didn't make much difference. The remaining 34 horses had a correlation for their SPIs of .63. On the surface, these correlations look pretty high, and indicate consistent performance by sires across crops. Of course, the problem is that a lot of that consistency may be coming from the fact that some sires are covering better mares than others, so we'll need to make an adjustment for that before we can have any confidence in the results we're seeing.

Tuesday, November 25, 2008

Data...Looking For Help

The single greatest impediment to doing statistical research for the thoroughbred industry is the lack of quality data. That may seem surprising, considering that racing is a very data intensive sport. But what I've found is that virtually every study I'm interested in doing requires data that either needs to be painstakingly entered into spreadsheets or databases manually, or simply isn't available. In many cases, the data is out there, but not in a form that's usable for statistical research. For example, I've got a copy of the American Produce Record...but the data in there can only be searched one horse at a time...not exactly ideal for statistical research where I'm looking for patterns over samples of hundreds or even thousands of horses. I've talked about some research I'd like to do evaluating the predictive value of AEI and AEI/CI...someone with direct access to the BRIS or Equineline databases could do the study in about ten minutes. Unfortunately, neither organization thinks that's how we should be accessing their data, so I would apparently need to pay for the data per horse. Again, not exactly ideal for statistical research.

If any of you have data that you can share that you think might be useful for me, I'd love to hear about it. I'd be happy to use it for research that you might find useful or interesting too.

Saturday, November 15, 2008

Vacation

I'll be away from Sunday until Friday, so there won't be any new posts until after I return.

Friday, November 14, 2008

Dehere

In the True Nicks Blog at The Bloodhorse today they discuss how the entire future of Dehere as a sire was changed when his daughter Arrested Dreams lost the 1998 Matron by a nose. The loss cost Dehere the chance to be the leading freshman sire, and ultimately probably resulted in his being exported to Japan (and not getting to cover the same quality of mares that he would have) before his value as a sire was recognized several years later. I can't really comment on whether this analysis of events is accurate, but if so it would show how unbelievably superficial a lot of the pedigree analysis in the thoroughbred industry, and how bad the tools that many use for major financial decisions are. Does anyone REALLY believe that an evaluation that comes up with a substantially different answer depending on which horse finishes a nose ahead in one race has any possible predictive value for the future?

Tuesday, November 11, 2008

2009 Kentucky Derby

Yes, I know it's early to be thinking about the Derby, but I am anyway. Like breeding and owning horses, handicapping can offer some nice opportunities based on people's inability to correctly take sample size into account in their analysis. The top two two year old colts in training in the US (Vineyard Haven and Midshipman) are being sent to Dubai to prepare for the 2009 Derby. The common thinking is that because this "has never been successful", it can't work. On the surface, this makes sense. The two year olds that have followed this path before have always either missed the Derby entirely or not performed well. My instinct upon hearing that two more top colts were being shipped out was something along the lines of "shoot, there go two more horses we'll never heard from again". On the other hand, when everybody has the same strong emotional reaction, it's worth thinking about logically to see if their may be an opportunity. Here are a few reasons to think that the trip to Dubai may not be the kiss of death that most people think:

1. Limited sample size. How many horses have actually followed this path? Ten? Fifteen? In any case, the number is small enough that on average, we'd expect maybe one Derby winner out of the group and one or two other good performances. Godolphin may just be suffering a run of bad luck.

2. Vineyard Haven and Midshipman are far more accomplished than most of the other two year olds that were sent to Dubai. Most (such as Etched and Numaany last year) have simply been impressive maiden winners. The two colts this year are both multiple Grade 1 winners.

3. Conditions have changed. The first colt I remember following this path was Worldly Manners. When he won the UAE Derby, I'm fairly certain I remember hearing that the entire field was owned by the ruling family of Dubai. That's got to make you wonder how tough a prep race it was back then. Now it's a world class race against Southern Hemisphere three year olds (four year olds by US standards). So it's likely that horses following the Dubai route to the Derby now are getting much better preparation than ten years ago, and the sample of valid comparisons is even smaller.

Based on all this, I'd suggest keeping an eye on Vineyard Haven and Midshipman for an opportunity to get good value betting on them. If they're part of the 'pool' for Derby futures, that could be your chance. Of if they make it to the starting gate after performing well in the Dubai prep races, they may go off at higher odds than they deserve.

Friday, November 7, 2008

More on Trainer Winning Percentage

In my previous post I mentioned that trainer winning percentages appear to be very stable from year to year. For the 73 trainers who were among the top 100 in the US in wins in both 2005 and 2006, there was a correlation of .83, which means that the list is exceptionally stable. The highest percentage trainers are pretty much the same from one year to the next to an extreme degree. In fact, it was so extreme that I could see the consistency just eyeballing the numbers. Interestingly, the correlation for 'in the money percentage' was slightly lower at .79 for reasons which aren't totally clear to me. I had assumed that it would be a little higher since the sample size of horses in the money is greater.

One thing to keep in mind is that the exact correlation should probably be taken with a grain of salt, since there is a selection bias at work here. The value is probably being artificially increased by the fact that we're not looking at the trainers who had a good enough winning percentage to get them into the top 100 in wins one year, but not the other. This is likely being offset to some degree by the fact that we're also excluding a much greater number of trainers whose winning percentages weren't high enough to make the list in either year. By doing so, we're creating a more homogenous group, which tends to lead to lower correlation values. Either way, if there was no correlation, we'd expect to see a value of 0 in our sample, and a value of .83 clearly indicates a great deal of consistency. It's likely that this is due partly to a trainer's "skill level", but even more to how aggressively they place their horses.

Thursday, November 6, 2008

Trainer Data

I'm in the middle of doing some research on the future success of horses after they're claimed, based on the 'type' of trainer they're claimed from. I'm defining those trainer types based on their winning percentages and their average earnings per start. While I'm going to be charging for the end results of my research, I've compiled aggregate data for all trainers who were among the top 100 in the U.S. (ranked by number of wins) during any year from 2004 to 2006. If anyone is interested in a spreadsheet with this data, I'd be happy to share it for free...just send me an email or put a request with your contact information in the comments here. One interesting thing that's showing up already is that trainer winning percentage appears to be very, very consistent from year to year. I'll be discussing that in more detail once I get to do a little more analysis.

Tuesday, November 4, 2008

Auction Strategy

Sorry about the lack of recent posts. I'll see if I can make up for it over the next few weeks.

In his latest blog post at The Bloodhorse ( ) Scot Gillies asked readers to weigh in with their thoughts on strategy for buying broodmares at auction. I thought I'd post my response here.

Basically, I think we need a little more information than he's provided...what is his total budget? Does that included the money allocated for ongoing expenses? What price range are the sixteen mares he's looking at likely to fall within? What his goals for his purchases? Is he planning to sell the offspring of his broodmare, or to race them?

Assuming that his budget is typical of a small-time owner, and not someone like IEAH or Zayat, and that he's breeding to race, I think he's better off going for a single broodmare. The expenses on two horses will be so much higher than one, that's each one is likely to have a negative contribution to profitability.

Also, I think he should wait until later in the sale, when the bargains are likely to be steeper. Yes, he may miss out entirely on getting one of his targets. But really, what's the harm in that? There's always another sale, with more mares available. In the meantime, it looks really unlikely that the overall economy or and/or the 'thoroughbred economy' will recover so fast that bargains won't still be available in a few months.