Tuesday, November 25, 2008

Data...Looking For Help

The single greatest impediment to doing statistical research for the thoroughbred industry is the lack of quality data. That may seem surprising, considering that racing is a very data intensive sport. But what I've found is that virtually every study I'm interested in doing requires data that either needs to be painstakingly entered into spreadsheets or databases manually, or simply isn't available. In many cases, the data is out there, but not in a form that's usable for statistical research. For example, I've got a copy of the American Produce Record...but the data in there can only be searched one horse at a time...not exactly ideal for statistical research where I'm looking for patterns over samples of hundreds or even thousands of horses. I've talked about some research I'd like to do evaluating the predictive value of AEI and AEI/CI...someone with direct access to the BRIS or Equineline databases could do the study in about ten minutes. Unfortunately, neither organization thinks that's how we should be accessing their data, so I would apparently need to pay for the data per horse. Again, not exactly ideal for statistical research.

If any of you have data that you can share that you think might be useful for me, I'd love to hear about it. I'd be happy to use it for research that you might find useful or interesting too.

2 comments:

Craig said...

Alex 100% correct. Next to baseball Horseracing is the most data driven sport there is. Information is not available. Maybe Horseracing can learn from Baseball and Football. It was the information that came first that made Fantasy popular for those sports. Not the other way around.

Thoroughbred Analytics said...

Many thanks for making the sincere effort to explain this. I feel fairly strong about it and would like to read more. If it's OK, as you find out more in depth knowledge, would you mind writing more posts similar to this one with more information?

Regards:

horse racing programs