Actually the other three issues I identified that need to be taken into account in designing my study looking at which sires produce sound offspring are relatively simple to address. As a reminder, here they are:
2. Was sample size sufficient to draw meaningful conclusions (the poster actually mentioned this one).
3. Were results being skewed by the fact that some horses in the data might not have finished their racing careers yet? If so, that would have more impact on the data for younger sires with fewer crops who had copleted their careers.
4. Were unraced horses included in the data? In most cases, the reason for not racing would presumably be lack of soundness.
Sample size can be addressed easily...just include a relatively full set of data. I'm mostly interested in sires that have been successful to some degree, so all of them will have had many children in each crop. I'll look at all of those children, unless data is hard to find for some (like those who raced overseas).
The issue of data being skewed by horses who haven't finished their careers yet will not affect the study as I'm approaching it, since we're looking at year by year results, rather than 'full career' result. Yet another advantage to doing it this way!
I definitely want the data to reflect horses that never raced. Other people have shown this data (% starters for the sire), but I think it's an important part of the data on soundness, and I'll include it too.