## Sunday, December 14, 2008

### Two Steps Forward...One Step Back

It's also frustrating to take a step backwards in any project, but sometimes it's necessary. I noticed that the correlations in my earnings per start data were lower for 1999 to 2000 than the earlier year, and suspected that there was something funny going on in the data. It appears I was right. Because the data was collected sometime around late 2002 or 2003, the year 2000 crop had only a very limited number of starts in the data, and this likely resulted in more random variation and less correlation to previous years. Because the average correlations I've been reporting include this crop, I'm going to go back and recalculate them without it. What I know is going to happen is that the averages will all go up. What I don't know is whether there was any systematic bias in the results. By having one crop where the horses' four year old and up seasons weren't included, were sires whose offspring mature late being 'punished'? Would this have had any impact on the analysis I've been doing? Probably not, but there's certainly enough question that I'll eliminate the year 2000 crop so that the data represents something closer to a complete record of the performance of the included offspring.

## Saturday, December 13, 2008

### Experimental Design

In response to the series of posts on my research into predictive factors for sires' offsprings' earnings per start, 'Winston...not really' suggested the basic design for an experiment involving breeding multiple stallions to the same band of mares to avoid the possibility that what we're measuring when look at stallion performance is simply a result of a systematic difference in the quality of mares that are being sent to them. His suggestion is definitely a good idea...if you have a few billion dollars to spare. For anyone who doesn't, statistical analysis should be sufficient. I've mentioned before that we're going to need to make adjustments to the sire statistics to take the quality of mares into account, and its something I'd like to tackle in the near future. However, if we're going to adjust for 'quality' of mare, it may make sense to at how to measure that quality first. Racing success? Previous breeding success? Quality of bloodlines? I know that some studies on the topic have indicated that racing success is more important than bloodlines, but I'm not sure anyone has tried to measure the relative predictive value of racing success versus success of previous progency...and how the balance changes as a mare ages and we're able to gather data on a slightly larger number of her progeny.

## Monday, December 8, 2008

### EPS Research Update

I've continued working on the earnings per start research that I discussed recently here. Some of the new findings:

1. Looking at the overall mean earnings per start for a sire's offspring, rather than the mean of the offsprings' earnings per start, did result in a slightly better predictive result (.45, instead of .40). Put more clearly (I hope), it was better weighting the EPS by race rather than by horse. However, this still wasn't as good a forecasting tool as the median (which had a crop-to-crop correlation of .50). I definitely need to try out something called a "weighted median" so that I can try weighting the median by race instead of horse. I expect this to provide an even more accurate prediction. And in fact, I'm guessing that weighting each race evenly isn't optimal either. We learn more information about a horse's ability from its first or second race than we do from its twentieth race, and the statistics will probably reflect that.

2. Using only the median data, comparing crops that were more than one year apart (ie looking at the correlation between a sire's 1997 and 1999 crops) SEEMS to show a slight trend towards less correlation than crops in consecutive years. The effect isn't clear enough to be definitive, but it would make some sense, since over time the quality of mares that a sire is sent to may change.

3. Also using only the median data I tried using two year averages (means) of the individual year medians as a predictor of the following year (for example, the average of the 1998 median and the 1999 median to predict 2000 results). It didn't seem to work any better or worse than one year. Even a three year average didn't work any better. This surprised me a little, but what may be happening is that the benefit of greater sample size that the multi-year averages are giving us could be cancelled out by the effect I described in #2 above. There's also a good possibility that this is simply due to random chance, and over large studies, the two year averages would have a slight benefit.

4. The most interesting result of the new work I did was that the average of the 1996 and 1997 medians was a GREAT predictor of the average of the 1998 and 1999 medians. Even without refining the 'predictor' to use a weighted median and to adjust for quality of mares, the correlation of the sires' earnings per start medians between these two year sets of data was .77.

Obviously these results have suggested all sorts of productive follow-up research. One of the things I'll need to look at is how the multi-year studies I did for median would perform for mean...since that's really what people are looking at when they look at a sire's career AEI.

1. Looking at the overall mean earnings per start for a sire's offspring, rather than the mean of the offsprings' earnings per start, did result in a slightly better predictive result (.45, instead of .40). Put more clearly (I hope), it was better weighting the EPS by race rather than by horse. However, this still wasn't as good a forecasting tool as the median (which had a crop-to-crop correlation of .50). I definitely need to try out something called a "weighted median" so that I can try weighting the median by race instead of horse. I expect this to provide an even more accurate prediction. And in fact, I'm guessing that weighting each race evenly isn't optimal either. We learn more information about a horse's ability from its first or second race than we do from its twentieth race, and the statistics will probably reflect that.

2. Using only the median data, comparing crops that were more than one year apart (ie looking at the correlation between a sire's 1997 and 1999 crops) SEEMS to show a slight trend towards less correlation than crops in consecutive years. The effect isn't clear enough to be definitive, but it would make some sense, since over time the quality of mares that a sire is sent to may change.

3. Also using only the median data I tried using two year averages (means) of the individual year medians as a predictor of the following year (for example, the average of the 1998 median and the 1999 median to predict 2000 results). It didn't seem to work any better or worse than one year. Even a three year average didn't work any better. This surprised me a little, but what may be happening is that the benefit of greater sample size that the multi-year averages are giving us could be cancelled out by the effect I described in #2 above. There's also a good possibility that this is simply due to random chance, and over large studies, the two year averages would have a slight benefit.

4. The most interesting result of the new work I did was that the average of the 1996 and 1997 medians was a GREAT predictor of the average of the 1998 and 1999 medians. Even without refining the 'predictor' to use a weighted median and to adjust for quality of mares, the correlation of the sires' earnings per start medians between these two year sets of data was .77.

Obviously these results have suggested all sorts of productive follow-up research. One of the things I'll need to look at is how the multi-year studies I did for median would perform for mean...since that's really what people are looking at when they look at a sire's career AEI.

## Thursday, December 4, 2008

### Changes In Behavior

The first study I did for Thoroughmetrics compared the success of horses that had been sold at yearling auctions against the success of those that were sold at two year old in training auctions. What I found was that there was a drastic difference. Not only did those in one of the types of auctions perform better overall (and at all price levels), but the performance was more predictable too...there was a much stronger relationship between price level and performance. Really valuable information for anyone spending hundreds of thousands of dollars (or more) per year at auctions of unraced horses.

That said, it recently occurred to me that there's a real problem with studies which look at any human behavior that changes drastically over time. In this case, the conclusions of the study will only be valid as long as buyers use the same criteria to make their decisions about what a given horse is worth, and as along as sellers enter the sames types or quality of horses in each type of auction that they did at the time of the study.

A similar problem exists with any data on the return on investment (ROI) of any handicapping system. Data gathered on the ROI will only be predictive as long as handicappers as a group don't change how they make decisions about how to allocate their money to horses with various characteristics. So although ROI is what the handicapper ultimately cares about, it is also one of the least stable measures to look at.

For handicappers, the solution is to actually create a 'line' based on the key characteristics of the horse. If you know what the odds on a given horse should be, you only need to compare those to the actual odds to know if a bet has a positive expectation. So instead of thinking something like "lone speed in a 6F race on average returns $2.30 for every $2.00 bet, so I'll bet on #4", handicappers should be thinking along the lines of "lone speed in a 6F race increases a horse's chance of winning by 15%...let's see if that makes #4 a good value at his current odds".

This type of change in thinking applies to ownership as well. Instead of doing studies that focus on general measures that change (like which type of auctions provide the best value), we're better served by looking at more stable measures. For example, if we find that offspring of AP Indy on average receive Beyer ratings 2 points higher in their races than offspring of Silver Charm, that's not affected by changes in human behavior (unless you think that there's a systematic change in how Beyers are being subjectively adjusted over time). So instead of simply knowing which type of auction tends to provide better value, we could go into any auction and evaluate what each horse is worth based on their Beyer ratings, and whether they're a good value based on the bidding.

I realize that in essence I'm saying that most people have the right approach, and that while the findings of my original study were valuable, they will become less valuable over time as behavior changes. However, where most people go wrong is that they judge each horse's value subjectively, or using flawed statistics that don't have much predictive value. It doesn't help to judge individual value, if you don't have a valid model for valuation.

That said, it recently occurred to me that there's a real problem with studies which look at any human behavior that changes drastically over time. In this case, the conclusions of the study will only be valid as long as buyers use the same criteria to make their decisions about what a given horse is worth, and as along as sellers enter the sames types or quality of horses in each type of auction that they did at the time of the study.

A similar problem exists with any data on the return on investment (ROI) of any handicapping system. Data gathered on the ROI will only be predictive as long as handicappers as a group don't change how they make decisions about how to allocate their money to horses with various characteristics. So although ROI is what the handicapper ultimately cares about, it is also one of the least stable measures to look at.

For handicappers, the solution is to actually create a 'line' based on the key characteristics of the horse. If you know what the odds on a given horse should be, you only need to compare those to the actual odds to know if a bet has a positive expectation. So instead of thinking something like "lone speed in a 6F race on average returns $2.30 for every $2.00 bet, so I'll bet on #4", handicappers should be thinking along the lines of "lone speed in a 6F race increases a horse's chance of winning by 15%...let's see if that makes #4 a good value at his current odds".

This type of change in thinking applies to ownership as well. Instead of doing studies that focus on general measures that change (like which type of auctions provide the best value), we're better served by looking at more stable measures. For example, if we find that offspring of AP Indy on average receive Beyer ratings 2 points higher in their races than offspring of Silver Charm, that's not affected by changes in human behavior (unless you think that there's a systematic change in how Beyers are being subjectively adjusted over time). So instead of simply knowing which type of auction tends to provide better value, we could go into any auction and evaluate what each horse is worth based on their Beyer ratings, and whether they're a good value based on the bidding.

I realize that in essence I'm saying that most people have the right approach, and that while the findings of my original study were valuable, they will become less valuable over time as behavior changes. However, where most people go wrong is that they judge each horse's value subjectively, or using flawed statistics that don't have much predictive value. It doesn't help to judge individual value, if you don't have a valid model for valuation.

## Wednesday, December 3, 2008

### Earnings Per Start

I finally got my hands on some data that will allow me to do research on the predictive value of various measures of sires' success. I have about ten years worth of basic data on the performance of offspring of the top one hundred sires at a point in time several years ago. The data has a number of minor flaws for my purposes, but it's still a treasure trove of valuable information and will allow me to start answering some of the questions I'm interested in.

I started playing around with it today, and took a look at some data for the 82 sires that had offspring in every crop from 1996 to 2000. So far I've found that the average correlation for consecutive years of the mean of earnings per start of the sires' offspring is .40. The average correlation for consecutive years of the median of earnings per start of the sires' offspring is .50.

What does that mean? Actually, we can't really draw any conclusions about the absolute levels of the correlations until we make some adjustments. It's very possible that a large part of the correlation is due to the relative quality of the mares being sent to each stallion. I have the data I need to make some adjustments for that, but it's not in 'machine readable' form. If anyone with access to a copy of American Produce Records wants to volunteer to help me enter the CI values for about 20,000 mares, I'm taking volunteers! In the meantime, my findings do show one valuable piece of information - that the median earnings per start correlates better from year to year than the mean of the offsprings' earnings per start does.

My next step (which I'll get to in the next few days) is to look at the actual mean earnings per start (in other words, counting each race once, instead of counting each horse once). I believe that will be a closer approximation of the popular AEI, so it will be interesting to see if that performs any better than the measures I've already tried. After that, I'm very interested to see how the different measures do at predicting eachother. In other words, is next year's crop's mean earnings per start better predicted by this year's crop's mean earnings per start or this year's median earnings per start. It would not be a huge surprise if the median turns out to be the best predictor, since it is less impacted by a single huge success (like Curlin). It would certainly be an indictment of using AEI to evaluate sires' quality if it can be outperformed by something as crude as the median earnings per start.

I started playing around with it today, and took a look at some data for the 82 sires that had offspring in every crop from 1996 to 2000. So far I've found that the average correlation for consecutive years of the mean of earnings per start of the sires' offspring is .40. The average correlation for consecutive years of the median of earnings per start of the sires' offspring is .50.

What does that mean? Actually, we can't really draw any conclusions about the absolute levels of the correlations until we make some adjustments. It's very possible that a large part of the correlation is due to the relative quality of the mares being sent to each stallion. I have the data I need to make some adjustments for that, but it's not in 'machine readable' form. If anyone with access to a copy of American Produce Records wants to volunteer to help me enter the CI values for about 20,000 mares, I'm taking volunteers! In the meantime, my findings do show one valuable piece of information - that the median earnings per start correlates better from year to year than the mean of the offsprings' earnings per start does.

My next step (which I'll get to in the next few days) is to look at the actual mean earnings per start (in other words, counting each race once, instead of counting each horse once). I believe that will be a closer approximation of the popular AEI, so it will be interesting to see if that performs any better than the measures I've already tried. After that, I'm very interested to see how the different measures do at predicting eachother. In other words, is next year's crop's mean earnings per start better predicted by this year's crop's mean earnings per start or this year's median earnings per start. It would not be a huge surprise if the median turns out to be the best predictor, since it is less impacted by a single huge success (like Curlin). It would certainly be an indictment of using AEI to evaluate sires' quality if it can be outperformed by something as crude as the median earnings per start.

## Tuesday, December 2, 2008

### True Nicks

One of the more popular (and controversial) factors that many people use in breeding decisions is the past success of 'nicks' between various bloodlines. Some people wouldn't consider a mating where the nick has been unsuccessful in the past, while others think that the entire theory is pseudo-science with no predictive power. One of the most popular ways of using nicks is to use the ratings provided by Truenicks. Their ratings are provided free for stallions whose farms have 'sponsored' them, while they are sold for around $20 per hypothetical mating for other stallions. The Truenicks people use the following statistics to back up their claim that their ratings are a powerful predictive factor in the success of matings:

"1. While only 13% of the entire Thoroughbred population earn “A” rankings (A to A++), 37% of the stakes winners rate as “A’s.”

2. Horses rated “B” or better (B to A++) represent just 30% of the entire population, yet 3 out of 4 (77%) stakes winners rank “B” or better.

3. Almost half of Thoroughbreds in general–44%–are on the low end of the scale (rated “C” through “F”), yet only two in 25 stakes winners (8%) have these lower rankings."

There's just one problem. As I've discussed in the past, this type of analysis is "cheating". Those who develop automated trading systems to predict movements in financial markets are all too familiar with the pitfalls of 'overoptimizing' your theory to fit past data. The systems developed do a great job of predicting the past, but have little predictive power going forward. The problem here is that the ratings themselves are based on the same data that is being used to demonstrate their success. To do a valid study of the predictive power, we'd need to look at rating in some point in time and then track the success of horses with various rating going forward.

The great news is that we have the data available to do that! Because of the free ratings available for many stallions, we can create a list of ratings for several hundred (or more) horses now, and then track their performance on the track over the coming months and years. It will be several years before we have the final results of the study, but the amount of work involved isn't overwhelming, and it will be a fun study to follow as the results begin to come in. I'll be talk more in future posts about the specific study design I'll be using for this, and how I'll deal with a few of the issues or pitfalls that it will face.

"1. While only 13% of the entire Thoroughbred population earn “A” rankings (A to A++), 37% of the stakes winners rate as “A’s.”

2. Horses rated “B” or better (B to A++) represent just 30% of the entire population, yet 3 out of 4 (77%) stakes winners rank “B” or better.

3. Almost half of Thoroughbreds in general–44%–are on the low end of the scale (rated “C” through “F”), yet only two in 25 stakes winners (8%) have these lower rankings."

There's just one problem. As I've discussed in the past, this type of analysis is "cheating". Those who develop automated trading systems to predict movements in financial markets are all too familiar with the pitfalls of 'overoptimizing' your theory to fit past data. The systems developed do a great job of predicting the past, but have little predictive power going forward. The problem here is that the ratings themselves are based on the same data that is being used to demonstrate their success. To do a valid study of the predictive power, we'd need to look at rating in some point in time and then track the success of horses with various rating going forward.

The great news is that we have the data available to do that! Because of the free ratings available for many stallions, we can create a list of ratings for several hundred (or more) horses now, and then track their performance on the track over the coming months and years. It will be several years before we have the final results of the study, but the amount of work involved isn't overwhelming, and it will be a fun study to follow as the results begin to come in. I'll be talk more in future posts about the specific study design I'll be using for this, and how I'll deal with a few of the issues or pitfalls that it will face.

## Monday, December 1, 2008

### How Much For The Holy Grail?

Ok, it may not be the 'holy grail' of pedigree analysis. In fact, it's really just a starting point, not a final destination. But after my realization a few days ago that sire data by crop simply isn't available for free, no matter how much manual work I'm willing to do, it's beginning to feel like it.

To recap, what I'm hoping to do is evaluate the predictive power of existing measures of sire success such as AEI and AEI/CI, and ultimately to come up with something that works better. In order to do that, I need to get data on multiple crops of offspring for a reasonable large group of sires. Realistically, I'd say I'll need about 50 sires included to have full confidence in my results.

Looking at the BRIS and Equineline web sites, it looks like I can get the data I need for $18 per sire. So it turns out that the holy grail costs about $900. Based on the sample reports on the BRIS site it looks like the data is human readable, but not in a good format for computer processing, so I'll probably have a long, boring data entry project to put it into Excel or Access, but that's something I can live with. If anyone with $900 to spare wants to invest in this project, I'd be happy to share the full results with you, reimburse your investment as I generate sales of the research, and throw in some free advertising on my site (if you run racing partnerships).

To recap, what I'm hoping to do is evaluate the predictive power of existing measures of sire success such as AEI and AEI/CI, and ultimately to come up with something that works better. In order to do that, I need to get data on multiple crops of offspring for a reasonable large group of sires. Realistically, I'd say I'll need about 50 sires included to have full confidence in my results.

Looking at the BRIS and Equineline web sites, it looks like I can get the data I need for $18 per sire. So it turns out that the holy grail costs about $900. Based on the sample reports on the BRIS site it looks like the data is human readable, but not in a good format for computer processing, so I'll probably have a long, boring data entry project to put it into Excel or Access, but that's something I can live with. If anyone with $900 to spare wants to invest in this project, I'd be happy to share the full results with you, reimburse your investment as I generate sales of the research, and throw in some free advertising on my site (if you run racing partnerships).

Subscribe to:
Posts (Atom)