Note: This is earlier work I did (last winter/spring) so some info may seem dated at time of posting. I’ve used data files current to then.
In section 1, we prepared historical hockey data for analysis. In section 2, we set up some functions do prepare the Dixon-Coles parameters. Now, we can use them to predict a game.
Poisson analysis is good at predicting, based on an average, the proportion of each integer events happening. So, if we expect 3.1 goals to be scored by a certain team as an average, we know that the proportion of that is going to be (in R code) dpois(3,3.1), equalling 0.2236768. Similarly, the chance that 1 and 2 goals is scored by that team to be dpois(c(1,2), 3.1) to be 0.1396525 and 0.2164614 respectively.
The Dixon-Coles optimization gave us ‘attack’ and ‘defence’ parameters, along with a home field advantage factor home and a fudge fator for low-scoring games, ‘rho’ (ρ). For two teams, Home and Away, we can calculate their expected goals socred by adding their attack, plus the opposing teams’ defence, plus (for home) the home field advantage. This will give us a ‘lambda’ (λ) and ‘mu’ (μ) which are the home and away goals expected. For example, let’s use Toronto Maple Leafs vs. Montreal Canadiens:
Montreal is a stronger team this year, we expect a higher attack and lower defence factor. For the 2015 season only, they are, in fact, 0.9781757 and -0.267567. Similarly, Toronto’s factors are 0.9660961 and 0.0829082 respectively.
We’ll do the addition to get λ and μ:
Thus, we expect Montreal to score 3.1527745 goals and Toronto to score 2.0107929 goals.
Using this knowledge, we can create a probability matrix of Home and Away goals’ Poisson probability. Using λ and μ, probability_matrix <- dpois(0:maxgoal, lambda) %*% t(dpois(0:maxgoal, mu)), and a maximum number of goals per team of 8:
0
1
2
3
4
5
6
7
8
0
0.0057
0.0115
0.0116
0.0078
0.0039
0.0016
5e-04
2e-04
0
1
0.018
0.0363
0.0365
0.0244
0.0123
0.0049
0.0017
5e-04
1e-04
2
0.0284
0.0572
0.0575
0.0385
0.0194
0.0078
0.0026
7e-04
2e-04
3
0.0299
0.0601
0.0604
0.0405
0.0204
0.0082
0.0027
8e-04
2e-04
4
0.0236
0.0474
0.0476
0.0319
0.016
0.0065
0.0022
6e-04
2e-04
5
0.0149
0.0299
0.03
0.0201
0.0101
0.0041
0.0014
4e-04
1e-04
6
0.0078
0.0157
0.0158
0.0106
0.0053
0.0021
7e-04
2e-04
1e-04
7
0.0035
0.0071
0.0071
0.0048
0.0024
0.001
3e-04
1e-04
0
8
0.0014
0.0028
0.0028
0.0019
9e-04
4e-04
1e-04
0
0
Table: Probability of specific score matrix
This has the away score on the top (as columns), and the home score along the side (as rows). Recall that we need to apply the τ function to this matrix, to account for low scoring games. Once that is done, we can sum the diagonal of the matrix to find the probability of a tie. The sum of the upper triangle is the probability of an away win, and the sum of the lower triangle is the home win.
Putting it all together we get this function:
Now we can look at one of the results (say, 2015 not time weighted) and see the probability of a home win (Montreal), draw, or away win (Toronto): 0.5968457, 0.1736576, 0.2240437. In fact, we can plot the probabilities for the four fits we have, to show the effect of each fit type.
While the odds of a draw haven’t changed much, the odds of a Montreal or Toronto win have slightly. Note that over the past 10 years, Montreal has performed better, on average, than Toronto, so this is expected. As well, Monteal benefits from the home ice advantage. We can calculate this for a Toronto home game too.
Why the difference in these two results? That’s because the home advantage factor ranges not insignificantly. While not a huge range, it does impact the probabilities enough to make a noticeable difference.
We can predict scores for our Toronto at Montreal game with some random number work. First, we’ll modify the probability matrix to contain the sum of the probabilities to that point. Then we can index goals based on the probability correllating to a random number.
So, we can predict the score for a game with predictOneGame, and get a result of home away:
Lets’ do that a few times, to see the different results we get.
But, recall that the NHL doesn’t allow draws. We cold solve that by randomly choosing a winner, but that does a disservice to teams who excel at those scenarios. A simple way of adding OT is to use the log5 method, invented by Bill James, which applies the following formula, (from Wikipedia) The Log5 estimate for the probability of A defeating B is $p_{A,B} = \frac{p_A-p_A\times p_B}{p_A+p_B-2\times p_A\times p_B}$, where $p_A$ is the proportion of A wins, and $p_B$ is the proportion of B wins. For now, we’ll plug in even values (we’ll use pa=pb=0.5, so log5 = (pa-(pa*pb))/(pa+pb-(2*pa*pb)) will equal 0.5 as well), but later we can re-evaluate performance of each team.
We’ll also go 50/50 for Shootout and Overtime, but can adjust those odds later too.
Re-writing the score prediction formula will give us the chance to simulate draw handling.
Trying this score simulation again:
To predict the winner of a game in OT, we’ll use the win percentages of each team in the most recent season. Let’s grab the results from 2015 and figure out each team’s OT and Shootout results. We have to look at each game, so we’ll take this opportunity to collect the rest of the stats available to make a stats table.
We use it by calling makeStatsTable with the input being the season data.
Team
GP
W
OTL
L
ROW
P
GF
GA
OT.Win.Percent
New York Rangers
82
53
7
22
49
113
252
192
0.5882
Montreal Canadiens
82
50
10
22
43
110
221
189
0.619
Anaheim Ducks
82
51
7
24
43
109
236
226
0.6667
St. Louis Blues
82
51
7
24
42
109
248
201
0.6667
Tampa Bay Lightning
82
50
8
24
47
108
262
211
0.3333
Nashville Predators
82
47
10
25
41
104
232
208
0.5
Chicago Blackhawks
82
48
6
28
39
102
229
189
0.8
Vancouver Canucks
82
48
5
29
42
101
242
222
0.7059
Washington Capitals
82
45
11
26
40
101
242
203
0.4
New York Islanders
82
47
7
28
40
101
252
230
0.619
Minnesota Wild
82
46
8
28
42
100
231
201
0.5
Detroit Red Wings
82
43
14
25
39
100
235
221
0.4074
Ottawa Senators
82
43
13
26
37
99
238
215
0.4333
Winnipeg Jets
82
43
13
26
36
99
230
210
0.3929
Pittsburgh Penguins
82
43
12
27
39
98
221
210
0.3571
Calgary Flames
82
45
7
30
41
97
241
216
0.65
Boston Bruins
82
41
14
27
37
96
213
211
0.4062
Los Angeles Kings
82
40
15
27
38
95
220
205
0.1154
Dallas Stars
82
41
10
31
37
92
261
260
0.5
Florida Panthers
82
38
15
29
30
91
206
223
0.3214
Colorado Avalanche
82
39
12
31
29
90
219
227
0.4138
San Jose Sharks
82
40
9
33
36
89
228
232
0.375
Columbus Blue Jackets
82
42
5
35
33
89
236
250
0.8235
Philadelphia Flyers
82
33
18
31
30
84
215
234
0.2162
New Jersey Devils
82
32
14
36
27
78
181
216
0.2308
Carolina Hurricanes
82
30
11
41
25
71
188
226
0.3158
Toronto Maple Leafs
82
30
8
44
25
68
211
262
0.4
Edmonton Oilers
82
24
14
44
19
62
198
283
0.2414
Arizona Coyotes
82
24
8
50
19
56
170
272
0.5556
Buffalo Sabres
82
23
8
51
15
54
161
274
0.5625
Table: Table of statistics in 2015-2016 season to today.
We’ll feed it into a log5 function, and use that instead of a factor of 0.5 to determine OT winners. As well, it’s trivial to determine that on average, the number of games ending in shootout is slightly higher than the number ending in overtime.
Finally, one more round of predictions:
Next time we’ll look at predicting a whole sesason.