I've continued working on the earnings per start research that I discussed recently here. Some of the new findings:
1. Looking at the overall mean earnings per start for a sire's offspring, rather than the mean of the offsprings' earnings per start, did result in a slightly better predictive result (.45, instead of .40). Put more clearly (I hope), it was better weighting the EPS by race rather than by horse. However, this still wasn't as good a forecasting tool as the median (which had a crop-to-crop correlation of .50). I definitely need to try out something called a "weighted median" so that I can try weighting the median by race instead of horse. I expect this to provide an even more accurate prediction. And in fact, I'm guessing that weighting each race evenly isn't optimal either. We learn more information about a horse's ability from its first or second race than we do from its twentieth race, and the statistics will probably reflect that.
2. Using only the median data, comparing crops that were more than one year apart (ie looking at the correlation between a sire's 1997 and 1999 crops) SEEMS to show a slight trend towards less correlation than crops in consecutive years. The effect isn't clear enough to be definitive, but it would make some sense, since over time the quality of mares that a sire is sent to may change.
3. Also using only the median data I tried using two year averages (means) of the individual year medians as a predictor of the following year (for example, the average of the 1998 median and the 1999 median to predict 2000 results). It didn't seem to work any better or worse than one year. Even a three year average didn't work any better. This surprised me a little, but what may be happening is that the benefit of greater sample size that the multi-year averages are giving us could be cancelled out by the effect I described in #2 above. There's also a good possibility that this is simply due to random chance, and over large studies, the two year averages would have a slight benefit.
4. The most interesting result of the new work I did was that the average of the 1996 and 1997 medians was a GREAT predictor of the average of the 1998 and 1999 medians. Even without refining the 'predictor' to use a weighted median and to adjust for quality of mares, the correlation of the sires' earnings per start medians between these two year sets of data was .77.
Obviously these results have suggested all sorts of productive follow-up research. One of the things I'll need to look at is how the multi-year studies I did for median would perform for mean...since that's really what people are looking at when they look at a sire's career AEI.