The ThoroughMetrics Blog

Sunday, December 18, 2016

Earnings-Based Metrics vs. Speed-Based Metrics

Sorry to put a teaser on here, but I'm almost ready to post my next article on Horse Racing Nation. I'll begin sharing some of the results I've found comparing the value of traditional earnings based metrics for evaluating sires vs. using speed-based metrics. Once I post the article, I'll post an update here as well, so that you don't need to keep checking for the article.

Wednesday, October 12, 2016

Average Starts: Mean vs. Median

In general, medians tend to be a better predictor of the future than means (averages), since they're not as influenced by extreme outliers. In many cases, the median value can even be a better predictor of the future mean than the mean itself. While I don't think that's the case with average starts per horse, it's worth noting that the median horse in my database has only 12 starts in their career (compared to the mean of 16). It's not surprising to see the median lower than the mean when looking at a metric that has an upper limit so far above the values for most members of the population.

Tuesday, October 11, 2016

Average Starts Per Horse By Crop

Well, this isn’t exactly what I expected to find. I took a look at the average number of starts per horse in each crop. I was expecting to see a gradual steady decline over the past 15 years. However, it appears that there hasn’t been much decrease since the 2002 crop.

For the 2011 crop, keep in mind that this data was gathered mid-way through their five year old season, so they will likely end up with a similar number of starts to previous crops.

These results definitely make the very high average number of starts that I found for some of the ‘old time’ sires’ offspring even more impressive, since it appears that the game hasn’t changed quite as much as I’d imagined.

Friday, October 7, 2016

Durability of Offspring of Big Brown and Mr. Prospector

In response to my article on Horse Racing Nation about the durability of Unbridled's Song's offspring, people mentioned Mr. Prospector and Big Brown as two other sires whose offspring might be expected to show signs of unsoundness. My data on both is limited, but I took a look anyway.

Mr. Prospector's offspring showed an average of ten career starts. That certainly suggests some issues, but may also be due to very limited sample size (less than 100 horses) in my database.

Big Brown's offspring so far are averaging eleven career starts. That sounds bad, but it's actually pretty average for two reasons:
1. I believe his first crop is only six years old now. So a very large percentage of his offspring who have raced have not yet finished their careers.
2. Other sires who began their stud careers in the 2007-2009 time period are showing similar averages. This may partly be due to the factor in #1 above, and also may reflect the tendency for horses to race less than they did even 10-20 years ago.

Tuesday, October 4, 2016

Unraced Sires

Just noticed another of my old posts, addressing the issue of unraced sires (particularly, sons of Storm Cat). Coincidentally, I saw the following article on Bloodhorse today: http://www.bloodhorse.com/horse-racing/articles/215550/unraced-stellar-rain-gets-his-first-winner I'm hopeful that I can do some research and provide some real analysis on the topic sometime soon.

Blood-Ex: Anybody know what happened

I had completely forgotten about this until I looked back at my last few posts from 2008. Does anyone know what happened with Blood-Ex? I seem to vaguely remember them announcing some delays to their launch. They definitely don't seem to exist currently, and a quick Google search didn't shed any light on what happened. Does anyone know?

Reviving ThoroughMetrics...Again

I've finally gotten access to a good enough database of pedigree information and racing results that I should be able to do a lot of the analysis I talked about years ago. As I work through the data analysis, I'll be posting here somewhat regularly, discussing some of the more interesting findings and issues I run into. I'm also going to be publishing a series of articles at Horse Racing Nation, looking at a variety of questions related to thoroughbred pedigrees.

Sunday, December 14, 2008

Two Steps Forward...One Step Back

It's also frustrating to take a step backwards in any project, but sometimes it's necessary. I noticed that the correlations in my earnings per start data were lower for 1999 to 2000 than the earlier year, and suspected that there was something funny going on in the data. It appears I was right. Because the data was collected sometime around late 2002 or 2003, the year 2000 crop had only a very limited number of starts in the data, and this likely resulted in more random variation and less correlation to previous years. Because the average correlations I've been reporting include this crop, I'm going to go back and recalculate them without it. What I know is going to happen is that the averages will all go up. What I don't know is whether there was any systematic bias in the results. By having one crop where the horses' four year old and up seasons weren't included, were sires whose offspring mature late being 'punished'? Would this have had any impact on the analysis I've been doing? Probably not, but there's certainly enough question that I'll eliminate the year 2000 crop so that the data represents something closer to a complete record of the performance of the included offspring.

Saturday, December 13, 2008

Experimental Design

In response to the series of posts on my research into predictive factors for sires' offsprings' earnings per start, 'Winston...not really' suggested the basic design for an experiment involving breeding multiple stallions to the same band of mares to avoid the possibility that what we're measuring when look at stallion performance is simply a result of a systematic difference in the quality of mares that are being sent to them. His suggestion is definitely a good idea...if you have a few billion dollars to spare. For anyone who doesn't, statistical analysis should be sufficient. I've mentioned before that we're going to need to make adjustments to the sire statistics to take the quality of mares into account, and its something I'd like to tackle in the near future. However, if we're going to adjust for 'quality' of mare, it may make sense to at how to measure that quality first. Racing success? Previous breeding success? Quality of bloodlines? I know that some studies on the topic have indicated that racing success is more important than bloodlines, but I'm not sure anyone has tried to measure the relative predictive value of racing success versus success of previous progency...and how the balance changes as a mare ages and we're able to gather data on a slightly larger number of her progeny.

Monday, December 8, 2008

EPS Research Update

I've continued working on the earnings per start research that I discussed recently here. Some of the new findings:

1. Looking at the overall mean earnings per start for a sire's offspring, rather than the mean of the offsprings' earnings per start, did result in a slightly better predictive result (.45, instead of .40). Put more clearly (I hope), it was better weighting the EPS by race rather than by horse. However, this still wasn't as good a forecasting tool as the median (which had a crop-to-crop correlation of .50). I definitely need to try out something called a "weighted median" so that I can try weighting the median by race instead of horse. I expect this to provide an even more accurate prediction. And in fact, I'm guessing that weighting each race evenly isn't optimal either. We learn more information about a horse's ability from its first or second race than we do from its twentieth race, and the statistics will probably reflect that.

2. Using only the median data, comparing crops that were more than one year apart (ie looking at the correlation between a sire's 1997 and 1999 crops) SEEMS to show a slight trend towards less correlation than crops in consecutive years. The effect isn't clear enough to be definitive, but it would make some sense, since over time the quality of mares that a sire is sent to may change.

3. Also using only the median data I tried using two year averages (means) of the individual year medians as a predictor of the following year (for example, the average of the 1998 median and the 1999 median to predict 2000 results). It didn't seem to work any better or worse than one year. Even a three year average didn't work any better. This surprised me a little, but what may be happening is that the benefit of greater sample size that the multi-year averages are giving us could be cancelled out by the effect I described in #2 above. There's also a good possibility that this is simply due to random chance, and over large studies, the two year averages would have a slight benefit.

4. The most interesting result of the new work I did was that the average of the 1996 and 1997 medians was a GREAT predictor of the average of the 1998 and 1999 medians. Even without refining the 'predictor' to use a weighted median and to adjust for quality of mares, the correlation of the sires' earnings per start medians between these two year sets of data was .77.

Obviously these results have suggested all sorts of productive follow-up research. One of the things I'll need to look at is how the multi-year studies I did for median would perform for mean...since that's really what people are looking at when they look at a sire's career AEI.