On Twitter, there are many excellent hockey analytics folks. One in particular, @IneffectiveMath, is worth following. Amongst other things, such as great visualizations of data, he’s running a contest this year that is rating models on their season-long predictions, using the following scoring scheme:
The score associated to a given estimate will be computed as the probability of drawing the actual result from a normal distribution with mean given by the estimated point total and standard deviation givin by the estimated uncertainty. The maximum possible score for a single team is 1, which can only be obtained by specifying the team’s point total exactly with an uncertainty of 0 – in this case you will score 0 for even the slightest deviation from your estimate. Specifying a larger uncertainty will help you capture some points even when your estimate is poor; specifying a smaller uncertianty will give you a larger score when your estimate is good. The overall score is the sum of the thirty team scores, so the maximum theoretical score is thirty.
Thus, the way to score a prediction is, at the end of the season, to sum up the dnorm(x=ActualPoints, mean = Points, sd=PointsSD). This is vectorized, you can sub in a column of a data.frame and then get the sum().
So, let’s try it.
Having prepared a 2014-2015 schedule, and recorded the 2014-2015 season points for each team, we can run our Elo rating predictions through that method to see how they do.
We’ll simulate a season using some more R type programming than past season predictors. We sample from the list [0, 0.4, 0.6, 1], corresponding to an away win, away OT win, home OT win, and home win, with the probabilities based on pWin (of the home team winning) of [1-pWin, 0.1-0.1*pWin, 0.1*pWin, pWin]. This gives us about a 10% chance of going into OT, with the chances of a team in OT the same as in regular time. While this sometimes adds up to more than 1 (eg: pWin = 1 --> odds = [0, 0, 0.1, 1]), the R function sample() can scale probabilites if needed.
We’ll vectorize the function to sample to make it easy to feed in a list of pWins, and get a matrix of samples back, easily repeated the number of times we want to simulate a season.
Similarly, we’ll vectorize a formula to give us the odds of a home team win given the Elo rating difference between two teams:
Finally, those two functions get put together with some data munging and compilation, and we get a predicted finish for each team after a certain number of sims.
Now, when this runs, we get a table of expected Wins, OT Losses, and Points for each team (mean, max, min, and sd). This gives us the information we need to be able to calculate a score for the season.
We can score it simply this way:
Next time we’ll optimize to this instead of by-game metrics, and see how we do.