Things are rarely as easy as they should be when doing research on thoroughbred racing and breeding, and a lot of the blame for that has to fall on those who have the data. Rather than making it available (at a price) in its 'raw' form, so that we can do the research WE find interesting, they insist on spoon feeding it in the format that THEY find interesting, which often makes it nearly impossible get the information we want.
One example of this is that I'm interested in seeing how successful the most popular statistics for ranking sires (AEI and AEI/CI) are in predicting future success. What I'd like to do is get a list of sires along with their AEI and AEI/CI and see what kind of correlations the statistics show over time. The problem is that I need to look at the data BY CROP. If I go to Bloodhorse.com the data is presented by year (never mind that its in a PDF file and I need to spend time on 'data hygiene' to even get it into a usable format). The problem is, looking at the data by year, of course there will be high correlation...because we're comparing some of the same horses to themselves. Smart Strike's results in 2007 and 2008 will both include Curlin...that's obviously going to make the correlation between his AEI and AEI/CI higher, but what is it really telling us? Not much.
What we really need is the AEI and AEI/CI data separated out by crop, and I think I may have figured out a way to calculate this, and will share it in a few days, once I've had the chance to test it out.