View Full Version : regress to the mean??
chuck
15th April 2005, 03:47.37 PM
Hi all,
I'm a long time user who has just gotten back into using HTR again.
I took a years worth of data and ran some queries that looked at different factors broken down by track and nDS. What I was looking for:was there subsets of data that one factor was better at a particular track/nDS. Of course, I found that some of the tracks that are more speed favoring showed rEP and rFR1 were good. But here is the interesting thing. On some of the more universal factors, like C90, some tracks it was profitable just playing this.
For example at Finger lakes, if you played the #1 ranked c90 on dirt sprints. You would have made money. 1008 plays win% of 28.67 and a roi of 1.01.
Here is the question. Why FL dirt sprints? and not dirt routes. Is 1000 plays enough to say that this year it will work. Or will it regress to the average win% and roi?
How do you folks handle things like this? How do you know that the spot plays will work going forward?
fred4now
15th April 2005, 04:46.52 PM
Hey Chuck,
Although I know that some factors do better at some tracks (or at least I think so) I usually don't bet only those tracks. I run my queries across all tracks and they seem to hold up better going forward.
The latest thing I have been doing to look at backfitted data is import the daily profit/loss into excel and put it in a graph. Access can do this also but not as nice as Excel. When you look at a graph you can immediatlely tell if it was only profitable because of a couple of huge spikes. Or is there a steady increase? I love the trendline feature.
Another thing, regression to the mean.
I haven't played this with real money but you could take a reliable factor like K and track it in a graph. Over time you should be losing so many cents on the dollar. If you only bet when it was way below the trendline you would be using regression to mean to your advantage.
One other thing, you can have a nice play and 5 out of 6 years shows almost the same profit and 1 year in the middle you would have lost your shirt. There is certainly an answer but you can go nuts trying to figure it out.
OK last thing ;)
Even if a play shows profit over a couple of years I am still going to see how it did last 6 months or 3 months. I really don't like taking a beating for very long.
Victor
15th April 2005, 04:52.39 PM
Short answer: I don't know.
Long answer: I don't know. :)
You may want to consider which factors are involved (and the logic behind them) and decide for yourself.
That's my best answer!
chuck
15th April 2005, 06:33.27 PM
fred4now,
I love the trend line idea. Have you used it with real bets yet? I saw another software vendor do something like that (Netcapper I think).
Makes a lot of sense. You know that the top rated K will win 30% with a roi .87. If it is currently winning 25% with a roi of .70. You know it is going to come back. You will be on it for the upside.
MikeDee
15th April 2005, 08:45.03 PM
There used to be a Chuck on HTR a few years ago that did his data base in dbase and, I think, was from the Pittsburgh area. Is this you Chuck? :)
Paladin
15th April 2005, 09:00.48 PM
Hey Chuck,
OK last thing ;)
Even if a play shows profit over a couple of years I am still going to see how it did last 6 months or 3 months. I really don't like taking a beating for very long. I don't like taking a beating for very long either, and this is where the importance of having a system that turns out a lot of plays, comes in.
When you want to see (as Fred said) how your plays (system) did for the last 3-6 months, you need enough plays during that time frame, to "give it (you) a chance".
The only other way I see, is to have such a huge ROI on your system (plays), that even when it's Not doing well, - you're still making money. I haven't reached there (the Promised Land) yet !
I used to keep narrowing down systems to get a better and better win%/roi, but now I feel that the LESS restrictive the better. The more plays, the less TIME your losing streaks last, and the more you can use the advantages of CHURN.
chuck
15th April 2005, 09:54.33 PM
Hey Mike,
Yeah, It is me. Still using dbase, but recently upgraded to Access.
hurrikane
15th April 2005, 11:43.21 PM
it also helps to break out every day by no of plays and returns.
As Fred says...don't like going long for long. But you have to know how long long really is.
If you are playing a 12% win you could go days without a hit but then BAM.
around 20 you can still go a couple of days and many many plays between win streaks.
good luck
MikeDee
16th April 2005, 05:27.35 AM
Welcome back Chuck,
Now we have 2 comebackers you and F4N.
JohnB
18th April 2005, 09:49.05 AM
Hey Chuck -
I primarily focus on spot plays, and have had the same questions as you. I agree with the replies on this thread. I have had better success with my spot plays by using very few factors, and playing them across all tracks. I recall a post by Ken in the past that said something to the effect that if your play gets 500 plays per year across all tracks (1000 plays per year would be even better), then there was a decent chance that the play was not a "backfit", and would have a good chance of going forward. One other thing that has helped all of my spot plays is using certain "eliminators" in all my spot plays to weed out low priced horses and horses that win very infrequently (example - I usually throw out rMLO=1 and nQP=0 horses in my spot plays).
I happen to be comfortable with long losing streaks - my favorite spot play has only a 14% win rate, but a huge ROI. Have been playing this play since 2/23/05 - average around 3 plays per day with this spot. Already have had one 28 race losing streak, and other losing streaks of 14, 13, 12, 11, 11, 11. It is not uncommon to go days on this play without a hit (went 10 days without a hit on the 28 race losing streak). However, the hits have been big to make up for the losing streaks: a $40 winner, a $50 winner, and a $160 winner take the sting out of the losing streaks. The key for me with spot plays is to have strong conviction in the play (I spent my first 6 months with HTR exclusively doing research to the tune of 30-40 hours per week, without placing a single bet. I came up with well over 1000 spot plays that were not profitable, but at the same time I have a handful of plays that I am very confident in that I firmly believe will go forward), a bankroll to handle the losing streaks (I generally flat bet to win only 1/2 of 1 percent of bankroll on each spot play), and discipline to punch out the plays each day without fail. Most of my plays are low win %/high ROI plays, so if I miss a bomb my profit is affected big time - I need to be involved on a daily basis.
I have another play that I have researched that wins 19% of the time and does slightly better than break-even, but averages around 15 plays per day across all tracks - I'll start playing that play soon to take advantage of the churn (will start slow, with 1/4 of 1 percent of bankroll to start until I get comfortable with the play). This play projects to win around the same amount of money as my favorite play that gets 3 plays per day, just in a very different way.
On another note - the trend line idea has always made a lot of sense to me, but I could never figure out in my brain how to make it work. What would you do if the last 200 K1 plays were winning 25% with a ROI of 0.80, but the last 400 K1 plays were at 35% with a ROI of 1.05? Which trend would you play?
fred4now
18th April 2005, 10:23.08 AM
Just got back from a motorcycle run over to Yuma.
Chuck, no I haven't bet strictly using the trendline. I do use it to look at my graphs though.
JohnB, I have been playing with spot plays for awhile and totally agree "less is more". I try to use as few factors as possible. I don't really know how I would bet using just the trendline ( or I would do it). Just another random thought from me.
All of my handicapping comes from a numbers point of view rather than knowledge of horseracing, so I look at things a little differently.
Later guys.
chuck
18th April 2005, 01:55.03 PM
Thanks for the replies.
Fred4now,
I played with the trendline some over the weekend. I could not see where it was any better at predicting that things were going to change for the positive than chance. Let me know if you see anything different.
tomcat
18th April 2005, 03:55.47 PM
Trend line? Is that like the market is down, so is bound to come up?
fred4now
18th April 2005, 05:14.00 PM
If the market is as steady as the K number, yes :D
Remember a couple of years ago when the stock fell pretty hard, every time it went lower, I bought. Pretty happy about my decisions right now.
Chuck, Not sure enough yet but on my own plays that I have made going forward I am starting to be able to tell when it is too high above the line or too low.
chuck
6th May 2005, 07:42.11 AM
I was reading some posts on the paceadvantage board regarding this topic. I thought they were pretty interesting. Thought I would copy them here:
Jeff P wrote <I've seen enough of my own data to be able to draw the conclusion that I'm not misleading myself. I'm not trying to deduce anything at all from the tiny sample that I've presented here. It's just one of many many examples that I could have pulled from my database. I've looked at enough of my own samples to know that SWINGIN' CHAMP at $54.60, while a nice payoff, really isn't a true outlier. Looking at very large chunks of data, I find that there have been literally hundreds of $30.00 - $60.00 gettable winners using my own models (spot play methods) over the years at the tracks that I follow.>
One of the most deceptive events in any handicappers life is the use of an unrepresentative data model. Tversky's "Law of Small Numbers" is a good read, as well as any number of business books on the topic of decision making. It is a big, big field, and unless you are aware of all the little mind traps like the recency effect, the availability heuristic, the self-confirming bias, and the rampant avoidance of disconfirming data, it is really easy to fall into a mindset that has a slightly flawed view of reality.
That is not a criticism; I do this for a living, and it is a continual struggle to build effective data models when everyone is screaming for validation of their hopes and wishful thinking. If you think handicappers have a warped view, try a conference room full of MBAs and senior executives who have built a sales projection for a new widget based on the opinions of a handful of paid stooges in a "focus group."
Something I have found really useful in my own analysis of results is using a series of 5% slices. Rather than using the bootstrap algorithm of random sampling with replacement (which uses the entire sample available repeatedly), try generating a series of random samples without replacement, each consisting of 5% of your sample. Obviously, if you have 300 races, each mini-sample is 15 races, with 1000 races it is 50 races. It needs to be without replacement, because what you are looking for is a randomly generated series of models of the same data set. If a bootstrap, or average, or even a regression analysis is used, all you ever discover is what happened. That may or may not reflect reality for the future.
For example, if you slice a data set of 1000 races, the win frequency and ROI should be fairly consistent. If the percentages vary more than 5-10% between samples, it is a sure indication of outliers. Specifically, the samples do not represent the whole population--only a historical (and non-replicable) segment of the population. That is why Quirin's regression models were so badly skewed; I have never heard of anyone who ever made money using his impact values.
Understand that my statements are intended to benefit you; if slicing and dicing comes up with a fairly stable and consistent model, I applaud your efforts and congratulate you. If not, at least you will know what kind of steps you need to take to weed out the anomalies that create the distortions.
I didn't learn that in a statistics class. I built a large and really spiffy model of races that showed a very substantial profit; I lost over $3000 before I realized something was seriously wrong and located several outliers that had corrupted the data. Considering that I build data models as a consultant, it was a really stupid mistake. It was also the turning point in my wagering career, because the models I use now accurately (and profitably) reflect reality, both historical and future.
Thanks
vBulletin® v3.8.4, Copyright ©2000-2012, Jelsoft Enterprises Ltd.