I finally got my hands on some data that will allow me to do research on the predictive value of various measures of sires' success. I have about ten years worth of basic data on the performance of offspring of the top one hundred sires at a point in time several years ago. The data has a number of minor flaws for my purposes, but it's still a treasure trove of valuable information and will allow me to start answering some of the questions I'm interested in.
I started playing around with it today, and took a look at some data for the 82 sires that had offspring in every crop from 1996 to 2000. So far I've found that the average correlation for consecutive years of the mean of earnings per start of the sires' offspring is .40. The average correlation for consecutive years of the median of earnings per start of the sires' offspring is .50.
What does that mean? Actually, we can't really draw any conclusions about the absolute levels of the correlations until we make some adjustments. It's very possible that a large part of the correlation is due to the relative quality of the mares being sent to each stallion. I have the data I need to make some adjustments for that, but it's not in 'machine readable' form. If anyone with access to a copy of American Produce Records wants to volunteer to help me enter the CI values for about 20,000 mares, I'm taking volunteers! In the meantime, my findings do show one valuable piece of information - that the median earnings per start correlates better from year to year than the mean of the offsprings' earnings per start does.
My next step (which I'll get to in the next few days) is to look at the actual mean earnings per start (in other words, counting each race once, instead of counting each horse once). I believe that will be a closer approximation of the popular AEI, so it will be interesting to see if that performs any better than the measures I've already tried. After that, I'm very interested to see how the different measures do at predicting eachother. In other words, is next year's crop's mean earnings per start better predicted by this year's crop's mean earnings per start or this year's median earnings per start. It would not be a huge surprise if the median turns out to be the best predictor, since it is less impacted by a single huge success (like Curlin). It would certainly be an indictment of using AEI to evaluate sires' quality if it can be outperformed by something as crude as the median earnings per start.