Elo Rating System: Ranking Champions League teams using Clojure
The Elo rating system uses the following formula to work out a player/team’s ranking after they’ve participated in a match:
R' = R + K * (S - E)
R' is the new rating
R is the old rating
K is a maximum value for increase or decrease of rating (16 or 32 for ELO)
S is the score for a game
E is the expected score for a game
I converted that formula into the following Clojure functions:
which would be called like this to work out the new ranking of a team ranked 1200 that beat a team ranked 1500:
The way it works is that we first work out the likelihood that we should win the match by calling expected:
This tells us that we have a 15% chance of winning the match so if we do win then our ranking should be increased by a large amount as we aren’t expected to win. In this case a win gives us a points increase of '32 * (1-0.15)' which is ~27 points.
I kept things simple by always setting the importance/maximum value of increase or decrease to 32. The World Football Rankings took a different approach where they vary it based on the importance of a match and the margin of victory.
I decided to try out the algorithm on the 2002/2003 Champions League season. I was able to grab the data from The Rec Sport Soccer Statistics Foundation and I’ve written previously about how I scraped it using Enlive.
With a lot of help from Paul Bostrom I ended up with the following code to run a reduce over the matches while updating team rankings after each match:
The matches parameter that we pass into top-teams looks like this:
And calling https://github.com/mneedham/ranking-algorithms/blob/master/src/ranking_algorithms/parse.clj#L22 on it gets us a set of all the teams involved:
We then http://clojuredocs.org/clojure_core/clojure.core/mapcat over it to get a vector containing team/default points pairs:
before calling http://clojuredocs.org/clojure_core/clojure.core/array-map to make a hash of the result:
We then apply a reduce over all the matches and call the function process-match on each iteration to update team rankings appropriately. The final step is to sort the teams by their ranking so we can list the top teams:
Interestingly the winners (Juventus) are only in 5th place and the top 2 places are occupied by teams that lost in the Quarter Final. I wrote the following functions to investigate what was going on:
If we call it with Juventus we can see how they performed in their matches:
Although I’m missing the final - I need to fix the parser to pick that match up and it was a draw anyway - they actually only won 8 of their matches outright. Barcelona, on the other hand, won 13 matches although 2 of those were qualifiers.
The next step is to take into account the importance of the match rather than applying an importance of 32 across the board and adding some value to winning a tie/match even if it’s on penalties or away goals.
The code is on github if you want to play around with it or have suggestions for something else I can try out.