Bayes Theorem In Horse Racing – Everything You Need To Know
Today I’m going to follow on from my original article about Bayes Theorem. I was excited to see that it caused quite a stir and I want to make sure that by the end of today you know exactly how to create the likelihood ratios needed to use it in racing and…
…I’m going to do an example with some real stats that you can take out and start to use immediately!
If you haven’t read my first article on Bayes Theorem then you should check it out here before you continue reading this post.
First of all we need to know how the calculations work. I want to warn you upfront that there is going to be some maths here. It’s nothing complicated but it may require you to go through it a few times.
Don’t let this put you off. If you want to make good profits in this game then you’ll have to get stuck into some basic maths occasionally. Don’t worry if it isn’t your strength, just go through it slowly and if you’ve got any questions then post a comment.
The basics of Bayes Theorem is that every horse in a race has an equal chance of winning. We then look at different factors and adjust this chance of winning up or down depending on a horses stats for the factor.
To do this we use something known as likelihood ratios. These are a simple way of measuring how much we should adjust a horses probability based on knowledge we have discovered.
These ratios are determined by taking the factor, for example horses who won last time out, and seeing how many of these won their next race and how many lost.
Using this information we can create a likelihood ratio that shows us how much this factor should adjust the odds line we are creating on each horse depending on whether it was a last time out winner or not.
We adjust each horses odds line and then repeat for the next factor.
This can be done with as many factors as you want, but today I will be doing it with just three to keep it easy to follow.
Calculating these requires some simple maths that can be done with a basic calculator and a pencil.
For each factor we have four pieces of information:
 Horses with factor who won their next race
 Horses with factor who lost their next race
 Horses without factor who won their next race
 Horses without factor who lost their next race
Using this information, the calculation that you are going to perform to find out how much a horses chance with the factor increases is.
(A x (B + D)) / (B x (A + C))
The calculation you’re going to perform to find out how much a horses chance without the factor decreases is:
(C x (B + D)) / (D x (A + C))
If you’re beginning to panic by looking at these letters then stop, take a deep breath and read on.
All you need to do is to replace the letters with the corresponding information from the list above and it tells you exactly what to do.
And, I’m going to perform a real life example of this now so that you can follow along.
As we are heading into the jumps season I’ve decided to analyse a race on the All Weather. The reason for this is that the conditions are more stable and there is generally less information to take account of which makes it easier for an example.
So I will be looking at a six runner race on the All Weather that was run on the 4^{th} October.
To start we have to give each horse an equal probability of winning the race. We don’t currently know any information about are factors so they all have an equal probability.
Determining this starting point is simple. We simply take the number 1 and divide it by the number of runners in the race. In this example that is 1 divided by 6 which gives us 0.17.
Each horse will start with a probability of 0.17 (which is the same as 17%) of winning the race.
Now we work out likelihood ratios to determine how we are going to adjust this for each runner.
The factors we are going to use are:
 Speed figure ranking from last race
 Distance winner
 Beaten Favourite
Now that we have determined the factors we are going to use, I’m going to take a sample of runners on the All Weather this year and start with factor 1.
I will look to see how important being the fastest ranked horse in the last race is, and we have got 8482 runners in the All Weather sample since April 2013.
These 7865 runners are broken down as follows for being having the factor of ranking 1 for speed figure in their last race:
Ranked 1 SPDFIG LTO  Won Next Race  Lost Next Race  Total 
Yes  124  588  712 
No  728  6425  7153 
Total  852  7013  7865 
Now that we have this data we plug it into our calculation:
Horses with factor = (124 x (588 + 6425)) / (588 x (124 + 728)) = 1.74
Horses without factor = (588 x (728 + 6425)) / (6425 X (124 + 588)) = 0.92
This means that horses with the factor increase their chances of winning by 1.74 and those without the factor decrease their chances of winning by 0.92.
To apply this information to our probabilities we simply multiple each horses probability by the relevant figures.
In our example race from the 4^{th} October we have the following runners…
Horse  Base Probability  Speed Rank LTO 
Caterina De Medici 
0.17 
1 
Silent Movie 
0.17 

Alnawiyah 
0.17 
3 
Fatima’s Gift 
0.17 
5 
Jowhara 
0.17 
4 
Just Darcy 
0.17 
2 
There is just one runner, Caterina De Medici, who ranked 1 for speed figure LTO, her Base Probability is multiplied by 1.74 while all the other runners have their probabilities multiplied by 0.92.
There is one runner who doesn’t have a figure and in this situation you can either calculate a likelihood ratio for this scenario or you can leave their probability the same as it was before the factor was applied.
After making our adjustments we have a new table of…
Horse  Prob After Factor 1  Speed Rank LTO 
Caterina De Medici 
0.29 
1 
Silent Movie 
0.17 

Alnawiyah 
0.15 
3 
Fatima’s Gift 
0.15 
5 
Jowhara 
0.15 
4 
Just Darcy 
0.15 
2 
Our next factor to consider is whether the horse is a distance winner. We have 7873 in our sample. The small difference in sample sizes is due to some of the horses previously not having a speed figure for us to calculate on. This will not affect the outcome of your calculations.
The data for this factor looks like:
Distance Winner  Won Next Race  Lost Next Race  Total 
Yes  60  479  539 
No  794  6540  7334 
Total  854  7019  7873 
We plug the data for this factor into our calculation:
Horses with factor = (60 x (479 + 6540)) / (479 x (60 + 794)) = 1.03
Horses without factor = (794 x (479 + 6540)) / (6540 x (60 + 794)) = 1.00
You can see straight away that this factor does not affect a horses chances by very much and you may want to choose not to use this in your own model.
It is always best to stick with factors that make a significant enough difference if possible.
However for this example we shall use the Distance Winner factor even though it makes only a small difference.
Horse  Prob After Factor 1  Distance Winner 
Caterina De Medici 
0.29 
N 
Silent Movie 
0.17 
N 
Alnawiyah 
0.15 
N 
Fatima’s Gift 
0.15 
N 
Jowhara 
0.15 
N 
Just Darcy 
0.15 
N 
None of the winners in our sample race were distance winners. This means that they all have their probabilities multiplied by 1.00. This leaves them exactly as they were before with our new table looking like…
Horse  Prob After Factor 1+2  Distance Winner 
Caterina De Medici 
0.29 
N 
Silent Movie 
0.17 
N 
Alnawiyah 
0.15 
N 
Fatima’s Gift 
0.15 
N 
Jowhara 
0.15 
N 
Just Darcy 
0.15 
N 
Our third and final factor is whether the horse was a beaten favourite. Our data table for this factor looks like:
Beaten Favourite  Won Next Race  Lost Next Race  Total 
Yes  99  756  855 
No  755  6263  7018 
Total  854  7019  7873 
We plug the data for this factor into our calculation:
Horses with factor = (99 x (756 + 6263)) / (756 x (99 + 755)) = 1.08
Horses without factor = (755 x (756 + 6263)) / (6263 x (99 + 755)) = 0.99
This time we can see that a horse that was a beaten favourite has a small increase in their chances of winning the next race. However it doesn’t reduce the chance of a horse winning significantly if they weren’t a beaten favourite.
Horse  Prob After Factor 1+2  Beaten Favourite 
Caterina De Medici 
0.29 
N 
Silent Movie 
0.17 
N 
Alnawiyah 
0.15 
Y 
Fatima’s Gift 
0.15 
N 
Jowhara 
0.15 
N 
Just Darcy 
0.15 
N 
There is just one runner in our example race who was a beaten favourite. This was Alnawiyah. This horse will have her probability increased by 1.08 while the others will have their probabilities decreased by 0.99.
These changes give us a final table of…
Horse  Prob After Factor 1+2+3  Beaten Favourite 
Caterina De Medici 
0.29 
N 
Silent Movie 
0.17 
N 
Alnawiyah 
0.17 
Y 
Fatima’s Gift 
0.15 
N 
Jowhara 
0.15 
N 
Just Darcy 
0.15 
N 
Having applied our three factors we can either use each horses probability as it is or normalise them to make a round book of 100% and create an odds line from them.
Looking at the raw probabilities we can see that based on these factors Caterina De Medici has a significantly larger chance of winning than the other runners.
However we also learned that while being a Beaten Favourite and Distance Winner is not going to make a significant impact to a horses chance of winning or losing in an All Weather race, being top ranked for speed in their last race makes a very significant difference.
You can now take this method and information and find the most relevant factors for the race conditions you are interested in. Plug in the figures and start finding the answers to the question… Who Is Going To Win This Race?
Hey Michael.Firstly another great article so thanks for that.What are the advantages of this over using Impact Values? Say I had a set of Iv’s for different race conditions and multiplied them to rate each horse,would using Bayes be an ‘superior’ way of going about things.
Cheers Luke
Thanks Luke. I wouldn’t say one is better or worse, just different. You can actually use IV’s in your Bayes model, for example horses with factor A that have an IV of 1.20 or higher and 1.19 or less.
Thank you very much for sending this article, i learn lot of things from you
Thanks Regards
susith
Thanks Susith
Hello Michael,
Thank you very much for all of the very useful information that you have been sending. I am afraid that I am struggling with this however…
The mechanics of the calculation are clear to me but it is the speed rank used for factor 1 that I don’t understand. Do you mean the rating given to the horse before the LTO race was run? Would this be the something like the Topspeed rating in the Racing Post? Or is there some other speed rating somewhere?
Hi Sue, thank you for your email. You have it exactly. The speed rating rank is the speed rating the horse achieved in its last race. The fastest horse would be ranked 1, the second fastest ranked 2 etc… Don’t forget that the winner isn’t necessarily the fastest speed rating once weight etc… Has been taken into account.
You could use the TS but I this article I used my own speed ratings which are available as part of the Racing Dossier at http://www.racingdossier.co.uk
Hi Michael
I noticed in one of the above comments that you say once weight has been factored. I believe although I may be incorrect that you never used to factor weight into speed figures, if that being the case may I ask what changed your mind. However if the former is not the case may I ask why would one factor weight into speed figures. Many thanks.
Hi Paul, no you’re right I actually don’t factor weight into my own speed ratings but a lot of people do and it made for the best example. Those who factor it into their speed ratings do so with the idea to account for whether the horse has been over/under weighted and to see how it would have run without that weight.
Great info! Should there be a consistent assignment of values of A,B,C and D? ie. A= those that show the factor and won, B= those that show the factor and lose, C= those that don’t show the factor and won, and D= those that don’t show the factor and lose.
Or would the assignment be different for each equation?
It is a consistent assignment, all that changes is the factor.
I agree with Tom: The assignment is definitely not consistent in this article, in some cases the even sub totals are used rather than the +ve, ve, with/without.
Hi Joe, where are you seeing the formula not being consistent throughout?
The calculations for the 2nd & 3rd factors are wrong:
the quantities into brackets should be B+D & C+D,
which is not the case.
Could you check and fix that please?
Please could you confirm which of the equations you think is incorrect and what it should be so we can take a look.
ok, I prepare that in the coming hours.
p.s. I made a mistake about the bracketed quantities:
I meant B+D and A+C.
formula used is (A x (B + D)) / (B x (A + C))
speed factor:
Horses with factor = (124 x (588 + 6425)) / (588 x (124 + 728)) = 1.74 is correct:
A=124 B = 588 ok
C=728 D=6425 ok
___________________________
distance winner factor:
Horses with factor = (60 x (479 + 6540)) / (794 x (60 + 479)) = 1.03 ( is it not 0.98 ?)
the data are
A= 60 B= 479
C=794 D=6540
equation should be (60 x (479 + 6540)) / (479 x (60 + 794)) = 1.03
______________________________
beaten favorite factor :
Horses with factor = (99 x (756 + 7019)) / (755 x (99 + 756)) = 1.08
the data are
A= 99 B= 756
C=755 D=6263
equation should be (99 x (756 + 6263) / (756 x (99 +755))
Thank you for pointing out the typos sandoz, they have been corrected now.
I have met a problem when counting the number of horses without factor.
Imagine the factor of the horse is jockey=”JockeyX”.
If the jockey is a flat jockey, should we ignore any non flat races when counting this number?
So, if the counting is made by a program, should we write:
If race is “flat” and jockey is “JockeyX” then
……….
else
horse won: yes +1 –>horsewithout factorWin
no +1 –> horsewithoutfactorLost
endif
OR
If race is “flat” then
if jockey is “JockeyX” then
……….
else
horse won: yes +1 –>horsewithoutfactorWin
no +1 –> horsewithoutfactorLost
endif
endif
the number of horsewithfactorWin & horsewithfactorLost will be the same in both cases but not the
nb of horsewithoutfactorWin and nb of horsewithoutfactorLost,
that is, the LR will be different.
Which way of doing is the correct one?
Thanks for the great questions Sandoz. As you may have expected there is no right or wrong answer here, you can do either dependent on which you feel is most relevant. My personal experience is that only considering races over similar conditions for horse factors can make a big difference, connection factors less so. However if you do only consider those races you need to be aware that some runners may have very little data. The best option is often to do both and combine them later on with different weightings.
The probabilities don’t add up to 1.0
Normalise it. Add to 1.08 so top rater is 0.28/1.08 = 25.9%