PDA

View Full Version : Statistics question


Huguenot
15th June 2006, 10:22.21 AM
I'm starting to keep some data in a speadsheet -- averages, ranges, etc for various things and wondered what if any the following statistical measures play in this. Could they help?
Keep in mind I know very little about statistics, did look up these devices on the Web so do have a basic grasp of them. Reason is these two are on my MS Works spreadsheet so might as well start here

1. Standard Deviation
2. Variances

It would be for such things as:

-- % Energy averages for distances/classes at each track. Right now I am throwing out the top and bottom 25 and averaging the middle 50 which comes out pretty close to the median.
For example Philly Park 8.3f since April the average %E pace of the race was 52.89 but the middle 50 averaged about 52.60. I'm trying to see if there's a corrolation to the average for that track/class and the race itself to predict pace pressure.
I use the last race at today's distance structure/surface unless the race was clearly aberational. But for pace it's important to stay and stay recent.
Race 4, June 12. Average of entires was 52.95. That's high. Top closer won at 4-1. Interestingly top E horse was 2nd. Talk about your counter-energy for exactas.

Race 2 June 12. Average of entires was 52.60. Looks honest right? But here's something most people don't consider with pace. E horses don't run against themselves! So in evaluating how past the pace set -up will be for an E or EP horse you have to take that horse out of the equation.
In this race, First Impressive had a 56.79. Take him out and he's FACING a field with an average %E of 52.00. Way below average for 8.3f at Philly this spring. His last two races he led and died -- we're talkign 33 and 24 lengths. Today no one gets within 2.5 lengths and he wins by 13 at 12-1.
(NOTE: Redboarding -- didn't handicap the card beforehand.)

-- Track profiles -- lengths behind at 1st/2nd calls.
Philly 8.3f for April and May. 49 races, average LB at 1st call was 2.25L, but 16 races are above that and 32 are below the average. Taking out the 6 extremes brings it to a 1.9L ave, but even within the remaining 43 races there are 13 above the average (closers), 5 at the average (2.0) and 25 below it.

Can the SDv or Variance help me interpret what all this means?
I figure SDv could tell me how evenly distributed the BL were compared to other tracks/distances??

-- average winning trainer/jockey/ped/workout figures for different tracks and classes.

Be easy on me!! I can't follow much of the math in the other threads.

MVM
15th June 2006, 12:11.16 PM
The primary use of StDev is to estimate just how often a value would occur if the data you were analyzing was a much larger dataset (inifinite in theory).

Example:

If you have 30 values ranging from 1-100, StDev could give you an indication of how often you would expect a value of 90 or more to occur if you had, say 100,000 values.

If the numbers between 1-100 were random, the answer to the 90 or more question above would obviously be .10 (10%). However, a collection of randomly generated values would not result in what is called a "normal distribution". A normal distribution is what is required for StDev to be of value in estimating likelihoods. Fortunately, most of the items you want to measure are not random, and may potentially produce a useable distribution.

(Ex. In a normal distribution of values 1-100, values between 40-60 would occur more frequently than values between 20-39 or 61-80. Also, in a normal distribution the chance of getting the value 100 may be 100,000-1, whereas in the randon scenario, it is 100-1).

If you want to use StDev, there are 3 things you should check for beforehand to determine whether or not it is the proper measurement tool. I'm not certain if these functions are in MS Works.
Check to see if the collection of all the values you are using have the following:

1) A sample size >=30
2) Kurtosis >-1 and <1
3) Skewness >-1 and <1

It's not really important to know what these mean, just know that if the dataset does not meet these criteria then StDev will produce misleading results. BTW, in Excel (sorry I know nada re: Works), these functions are Count, Kurt and Skew.

Also, on the basis of my experience, be careful in combining all types of races when using the Energy% or any other type of pace/final ratios (I don't use %E, but something similar).
For both race fractions and winner fractions, younger/less experienced-horses/races differ markedly from older/more experienced-horses/races, and female-horses/races differ from male-horses/races.

Hope this helps a little at least.

njcurveball
15th June 2006, 01:22.38 PM
Great question and answer in this thread!

You guys are definitely forward thinkers and should be making some real cash with your insights!

keep it coming!
Jim

Huguenot
15th June 2006, 10:37.05 PM
Appreciate the reply -- I also have Office Calc, much more sophisticated than MS Works and that might have those statistics.

Breaking down energy by age/sex etc. is exactly why I am starting to use a spreadsheet. By determing as best as is possible what is "normal" for each distance/class/age/sex I can see how a given %E fits in a race.
The problem is there are seasonal variations at work here too. Samples from the winter cant be compared to mid-summer so the sample sizes tend to be low. But even if I have put 10 races in say, MSW/3 yr and up/FillyMare compared to rive races in NW1-2/male/3 yr olds, that can be enough to make some observations.

tbrown
15th June 2006, 11:53.53 PM
Andy,
A coule of things I do is to caluclate both average and median for smaple sizes - this points out wacky values.
I also tend top only use data within +/- one standard deviation of the mean. This is exactly how I came up with +/- 4 instead of 2 for fast/slow race shapes.

Huguenot
16th June 2006, 10:23.09 AM
Andy,
A coule of things I do is to caluclate both average and median for smaple sizes - this points out wacky values.
I also tend top only use data within +/- one standard deviation of the mean. This is exactly how I came up with +/- 4 instead of 2 for fast/slow race shapes.

Interesting, Tom. So 4 points works better than 2 on the pace shapes?

tbrown
16th June 2006, 08:12.50 PM
Yes, and I checked at Aqu, Bel, and FL, all seperately - almost identical results.

Huguenot
17th June 2006, 12:36.36 PM
Yes, and I checked at Aqu, Bel, and FL, all seperately - almost identical results.

Yes, but does it work at Will Rogers Downs?