How To Choose Factors That Predict Winners
When you start to analyse a race do you know what factors you should be using?
Do you know which are the most important for the race you’re analysing?
All races are not made equal. Which means every race is likely to have different factors that are important.
If you think about factors you probably can initially only think of a handful. Some of those may be speed, pace, class, form and connections.
But theyāre not really factors.
Theyāre categories.
The factors are what fall into each of those categories. If we take speed, then some of those factors may beā¦
Recent Speed
Average Speed
Best Speed Over Going Conditions
Best Speed Over Distance
Average Speed Over Going Conditions
Average Speed Over Distance
And even these can be broken down into more factors. Taking Recent Speed we could have several definitions of what that means, such asā¦
Recent Speed Over Last 7 Days
Recent Speed Over Last 14 Days
Recent Speed Over Last 30 Days
Recent Speed Over Last 60 Days
Recent Speed Over Last 90 Days
And even then we could have several definitions of what āRecent Speedā actually means. We know it means how horse the fast has been running recently, but how do we define that. It could be defined as the fastest the horse has run over the timeframe, or the most common speed weighted towards the most recent race or at least another two or three variations.
What weāre left with now is a huge number of factors.
When we analyse a race we want to try and get the number of factors weāre using to around 10 and certainly no more than 20.
So how do you determine which are the best to use for the specific race conditions youāre analysing?
You could simply use your gut instinct. And thereās definitely nothing wrong with that. It will be quick and the more your do it the better you will get at it. However there is a learning curve there and it will take time to implement it every day. If you rush then youāre likely to make mistakes.
But today I want to share with you the approach that I use to finding the factors Iām going to use to analyse a race.
I start by considering the race conditions. I look at the:
Race Type
Course
Number of Runners
Distance
Class
Going
With that information I go and find a range of past races that match similar conditions. If there are too many to choose from then I will add the classifications for the race into my filter, if there arenāt enough I will look at races over a slightly longer/shorter distance and with slightly more/less runners.
When Iāve got a handful of races, as a general rule I like to get as much data as possible but if youāre doing it manually then you should have at least ten races.
Once Iāve got those races I then look for the horses that contended. These are the horses that performed well, not just the ones that won the race.
I always prefer to use contenders rather than winners in analysis because whether a horse won the race or not can come down to luck. If they get blocked at the final, then they lose, if they get bumped… they lose. But that doesnāt mean they didnāt run a strong race and if that hadnāt happened they wouldnāt have won.
Once Iāve separated out those horses that contended in each of the races itās now time to look at all the factors you have them and determine the strongest.
And how do you do that?
Start by breaking down your factors into twelve ābucketsā.
If your factor was a ranking factor at listed horses from best to worse with 1 being the best, then there would be buckets going from 1 – 12.
This is the easiest approach and why I would suggest you start by using only factors which score horses in the field from best to worse. Your twelve ābucketsā would then beā¦
<=1
2
3
4
5
6
7
8
9
10
>=11
Horses With No Rating
Never forget to include the horses with no ratings. This can give as much information as horses that do have a rating.
To determine the most important factors to use in the race you now calculate a Chi Score for each of those buckets.
The result of this will tell you which of your factors are likely to have the most impact on predicting the winner of the race!
If youād like to know how to perform the Chi Score calculation then leave me a comment and if thereās enough interest Iāll write it up in next week’s post.
I would love to know how to calculate the Chi score, and thanks again for another interesting topic
Thanks Richard
Yes always interested in other ways Michael, so would be interested in how to calculate the Chi Score.
Thank you Wendy š
yep bring it on Michael, thanx
It would be very interesting to see. Thanks
Think Michael covered this here http://www.cyobs.co.uk/using-chi-square.pdf
You’re absolutely right, I did š I will re-write again in case it needs bringing up to date.
Interesting stuff again Michael – thanks.
Unfortunately (for you) it highlights the labour intensive element which is identifying and analysing past similar races. Unfortunate for you because it prompts me to ask how you can facilitate the ‘match similar & analyse’ function within the ratings. Many of us simply don’t have the time to carry out this work within a reasonable time frame across a sufficient number of races (many races will reveal no meaningful pattern at all)
For what it’s worth, and with the above in mind, I apply the same few criteria to all races, manipulate the ratings to produce a top three for each race. I’m currently achieving around 60% (variance of +/- 10%) strike rate for the top three with a LS loss of around 10% at SP. I suspect that with pattern matching and using Betfair I could turn this into worthwhile profit. I’d be interested to know your thoughts.
As you say Keith, it is labour intensive. I will however, look at a way of being able to do it and reduce that labour. Interesting that you’re already using it. At a 10% loss to SP you would probably be near break even at Betfair SP or BOG, which is a great start. Using pattern matching you should be able to increase your strike rate to 70%+ with top three š
And is it possible for you to show a working example of the factors/buckets please?
Thanks
Absolutely. So let’s say we’re looking at Factor 1 and the ratings go from 0-100 with no decimals. The buckets would be 0-10, 11-20, 21-30, 31-40 etc…
You would repeat this process with factor 2 etc… ideally you want to break your buckets so that there are a similar number of runners in each one. That means if a rating goes from 0-100 but most fall between 40-60 you would have bigger ranges outside of that and smaller inside of that to try and even out the number of runners in each bucket.
Yes always interested
Yes always interested