Monday, 26 August 2013

How To Calculate Probabilities From Bookmakers' Odds

If a fair bet on a horse or football team is paying $4.00, that implies a probability of winning of 1/4. That’s because for every 4 such bets you take at $1.00 each, you’ll expect to win one. So the general principle in an unbiased bet is that the probability of the outcome is 1 / the payout. Alternatively, the payout should be 1 / the probability of the outcome.
But bookmakers always pay less than this. That’s how they make a profit: by biasing the wager in their favour. Equivalently, the probabilities implied by bookmakers’ payouts always add up to more than 1.
The simplest example is the points start bet between two football teams. It’s always set such that the implied chances of either outcome are 50/50. But the payout is somewhere in the range $1.90 - $1.92 per $1 bet, depending on the bookmaker. In such cases, even though the probabilities, say 1/1.9, add to more than 1, it is clear from the equal payoffs that the true implied probabilities are 50% for each team. Note that points starts are always halves eg. +3½, so there is no third possibility of a draw.
An asymmetric example is a bet on the winner of a football game where one team is expected to win. They might be paying $1.50, their opposition $2.50, with $15 for a draw. The implied probabilities here are 1/1.5 = 2/3, 1/2.5 = 2/5 and 1/15. Notice that they add to 17/15. To get the true probabilities implied by the payoffs, we could naively just multiply by 15/17, so the actual probability of a win for the favourite is 10/17, their opposition 6/17 and a draw 1/17.
Another example might be a horse race in which the runners are paying $3, $4, $6, $8, $8, $12, $24. If you add the implied probabilities, you’ll get 9/8. So, we could just multiply them all by 8/9 to get the actual probabilities implied by the bookie’s odds: 8/27, 2/9, 4/27, 1/9, 1/9, 2/27, 1/27.
Seems easy. Just like picking the winner in the football game, there’s no overwhelming favourite and the true probabilities are all in the ratio implied by the bookmaker’s payouts.
Now consider the case of a two horse race with an overwhelming favourite, for example betting on the election result in a safe seat. Let’s suppose the favourite is paying $1.01 and the challenger $20. The implied probabilities are 100/101 and 1/20. If you look at betting on the federal election on Sportsbet or TAB, you’ll see the challenger is typically paying even less than this.
Let’s try our previous method. The implied probabilities add to 2101/2020, so multiply each by 2020/2101 to get 0.9519 and 0.0481. By this method, the outsider’s true probability is roughly the 5% implied by the $20 payout, but the favourite’s true probability is estimated at just over 95%, despite the odds implying a chance of winning of approximately 99%.
Is this a reasonable outcome? No.
The reason why is that the $20 payout figure is essentially made up. To protect the bookmaker against a freak loss, this payout is well under the payout implied by the true probability of the outsider winning. There is a big difference between a 95% chance and a 99% chance. The odds of the favourite winning really are much closer to the latter.
What this situation is really telling us is that, unlike in the football game or horse race, the two payout values do not contain the same amount of information. In a fair bet on a two horse race, the probability of one outcome contains all the information about both, since the probabilities add to 1. However, bookmakers’ payouts in a two horse race contain independent components of information, since their implied probabilities add to more than 1. We need to adjust the improper, payout implied probabilities to reflect the fact that the $1.01 payout contains more information as to the true probabilities than the $20 payout ie. adjust the favourite’s probability by a little and the outsider’s by relatively more.
One way to do this is to use the log likelihood function. It is the log of the product of the probabilities of all the possible outcomes.
If the payout implied, improper probabilities are q1, q2, …,  then the (improper) log likelihood function is
            Limp = Σk log qk
If we assume the log likelihood function of the true probabilities is a constant times the above function:
            L = C * Σk log qk
this implies the true probabilities are pk = qkC, with the constraint
            Σk qkC = 1
We can solve this equation for C and thus determine the true probabilities, assuming that larger implied probabilities contain more information.
Applying the method to our {$1.01, $20} race, we obtain C = 1.4234 and actual probabilities of 0.986 and 0.014. Notice that the favourite’s probability is still close to the bookmaker implied 100/101, but the outsider’s estimated probability is now approximately 1/70.
Let’s take payouts for the perfidious Tony Windsor’s electorate of New England, almost certain to be regained for the Nationals by his arch nemesis and inveterate dill, Barnaby Joyce. They are $1.01 (Joyce), $13 (ALP), $41, $51, $81 for the rest. The implied probabilities add to 1.1234, but simply dividing by this value estimates a true probability of 88% of Joyce winning the seat. This is clearly wrong: he is almost certain to win.
Using the log likelihood method, we obtain C = 1.6936 and fair payout values of $1.017, $77, $540, $780, $1700, with the actual probabilities being the reciprocals of these amounts.
Sportsbet has payouts of $1.001, $15, $26, $34 for New England (the $26 being for any candidate but Joyce, the ALP and Palmer United). Applying the log likelihood method to these gives fair payout values of $1.0024, $600, $2195, $4135, much more realistic, given the bookie’s payout on Joyce of 1 cent per $10 bet.
The main point here is that dividing the improper, payout implied probabilities by their sum to obtain the actual implied probabilities (as in our first few examples) only works as a reasonable approximation when there is no clear separation of the outcomes into two groups such that the winner is overwhelmingly likely to come from one group. The case of a single, overwhelming favourite and a group of also rans is the obvious example. In such cases, the log likelihood method works well. It properly uses the information in the favourite payout and more realistically estimates the total probability of the outsiders.
In cases where there are multiple favourites and the remainder are long outsiders, the accuracy of the simple division method is less clear cut. Consider the example of three equal favourites, each paying $3 and the remainder paying long odds, say $25, $50 and $100.
In this example, the improper probabilities add to 1.07. Dividing by this, we obtain actual probabilities of 100/321 for the favourites and 4/107, 2/107, 1/107 for the others. This gives fair payouts of $3.21, $3.21, $3.21, $26.75, $53.50 and $107; not much different to the originals. At a glance, it’s not obvious there is anything wrong, perhaps because the long odds payouts really are accurate representations of the relative likelihoods of the outsiders versus the favourites winning.
But perhaps they are not. Perhaps the chance of the winner coming from one of the 3 favourites is very high, say 99%. There is simply insufficient information in the payouts to differentiate approximately accurate payouts for outsiders from unrepresentative ones.
For cases with multiple favourites and the remainder long outsiders, the value of C in the log likelihood method is not much greater than 1. The estimates of the actual probabilities of the outside chances are therefore not much greater than 1 / the payout values.
In the 3 favourites case above, the log likelihood method gives C = 1.054 and revised payouts of $3.18, $3.18, $3.18, $29.75, $61.75 and $128. With one overwhelming favourite, the method tacitly assumes its odds are approximately correct and decreases the probabilities of all other outcomes. With multiple favourites, there is no evidence to support this and so the actual probabilities are close to those obtained by the simple division method.
Such cases require a decision as to whether we believe the ratio of the long to short odds is approximately correct or alternatively, does the bulk of the excess probability in the improper prior come from understating the outsiders’ payoffs?
If the latter, we need to estimate the tail of the distribution separately ie. choose a threshold payout beyond which the information content reduces rapidly and adjust the payouts upward to decrease the size of the tail prior to applying the log likelihood method. Such a procedure is systematic, but necessarily subjective ie. not derivable from a priori principles.
Suppose we choose some threshold payout Q and multiply each payout qk by max (1, qk/Q). Note that this function is subjective and requires calibration to beliefs about the true tail probabilities, or equivalently, the true chance of the winner coming from the group of favourites. Other functions will achieve qualitatively the same result.
For example, let Q = $10 in the 3 favourites case above. Their payouts are unchanged. However, the others become $62.50, $250 and $1000. Applying the log likelihood method then gives fair payouts of $3.06, $3.06, $3.06, $67, $275 and $1130. These are commensurate with the belief that the winner will almost certainly be one of the 3 favourites.
Note that this payout threshold transformation can be applied prior to using the log likelihood method in the case of a single favourite if there is a firm belief the long odds have been understated. The effect is not so great, however. In the case of the odds for the federal seat of New England, firstly applying the threshold multiplication with Q = 10, the result after the log likelihood method is $1.015, $71, $2250, $4340 and $17500. Interestingly, the probability of the ALP candidate winning increases from 1/77 to 1/71. This typically happens: the second shortest priced candidate comes in a little if the transformation is made. This illustrates the importance of only applying such subjective transformations if there is some external evidence to support doing so.
Stay tuned … the reason I’ve given this topic so much thought is that I’m about to apply it to sports betting odds on individual seats in the upcoming federal election. This will allow a simulation of the overall election and an estimation of the chances of various outcomes, including the parliamentary majority gained by the winner.

No comments:

Post a Comment