One of the more popular (and controversial) factors that many people use in breeding decisions is the past success of 'nicks' between various bloodlines. Some people wouldn't consider a mating where the nick has been unsuccessful in the past, while others think that the entire theory is pseudo-science with no predictive power. One of the most popular ways of using nicks is to use the ratings provided by Truenicks. Their ratings are provided free for stallions whose farms have 'sponsored' them, while they are sold for around $20 per hypothetical mating for other stallions. The Truenicks people use the following statistics to back up their claim that their ratings are a powerful predictive factor in the success of matings:
"1. While only 13% of the entire Thoroughbred population earn “A” rankings (A to A++), 37% of the stakes winners rate as “A’s.”
2. Horses rated “B” or better (B to A++) represent just 30% of the entire population, yet 3 out of 4 (77%) stakes winners rank “B” or better.
3. Almost half of Thoroughbreds in general–44%–are on the low end of the scale (rated “C” through “F”), yet only two in 25 stakes winners (8%) have these lower rankings."
There's just one problem. As I've discussed in the past, this type of analysis is "cheating". Those who develop automated trading systems to predict movements in financial markets are all too familiar with the pitfalls of 'overoptimizing' your theory to fit past data. The systems developed do a great job of predicting the past, but have little predictive power going forward. The problem here is that the ratings themselves are based on the same data that is being used to demonstrate their success. To do a valid study of the predictive power, we'd need to look at rating in some point in time and then track the success of horses with various rating going forward.
The great news is that we have the data available to do that! Because of the free ratings available for many stallions, we can create a list of ratings for several hundred (or more) horses now, and then track their performance on the track over the coming months and years. It will be several years before we have the final results of the study, but the amount of work involved isn't overwhelming, and it will be a fun study to follow as the results begin to come in. I'll be talk more in future posts about the specific study design I'll be using for this, and how I'll deal with a few of the issues or pitfalls that it will face.