While I truly believe that statistical analysis can ultimtely answer many questions about what factors impact breeding and racing results, I also believe that most of the answeres are not as straightforward as they seem.
As an example, someone on one of the discussion boards I visit recently asked, "Which sires consistently produce sound offspring?" In response, someone else posted a list of some well known sires, along with the average numbers of starts for their offspring. As a general approach, this seems reasonable. And it's far better than going based purely on reputation. That said, none of the following potential sources of bias were addressed:
1. Were the offspring of better sires retired earlier because of breeding value, rather than unsoundness?
2. Was sample size sufficient to draw meaningful conclusions (the poster actually mentioned this one).
3. Were results being skewed by the fact that some horses in the data might not have finished their racing careers yet? If so, that would have more impact on the data for younger sires with fewer crops who had copleted their careers.
4. Were unraced horses included in the data? In most cases, the reason for not racing would presumably be lack of soundness.
In light of the common opinion that horses today are being bred for speed rather than soundness, the question of which sires pass on soundness is a good one. And the approach of looking at starts per offspring is valid. But a lot more than that goes into doing statistically valid research. Next post, I'll talk a little about how a study could be designed that would eliminate or reduce the impact of each of the four problems I discussed above.