Mark Needham

Thoughts on Software Development

Archive for the ‘Ranking Systems’ Category

Elo Rating System: Ranking Champions League teams using Clojure Part 2

without comments

A few weeks ago I wrote about ranking Champions League teams using the Elo Rating algorithm, and since I wrote that post I’ve collated data for 10 years worth of matches so I thought an update was in order.

After extracting the details of all those matches I saved them to a JSON file so that I wouldn’t have to parse the HTML pages every time I tweaked the algorithm. This should also make it easier for other people to play with the data.

I described the algorithm in the previous post and had analysed the rankings for one season. However, it was difficult to understand why teams has been ranked in a certain order so I drilled into the data to find out.

Since the 2012/2013 season is the freshest in my memory I started with that.

The first thing to do was load the matches from disk:

(ns ranking-algorithms.uefa
  (:require [clj-time.format :as f])
  (:require [ :as json]))
(defn as-date [date-field] (f/parse (f/formatter "dd MMM YYYY") date-field ))
(defn date-aware-value-reader [key value] (if (= key :date) (as-date value) value))
(defn read-from-file [file]
  (json/read-str (slurp file)
                 :value-fn date-aware-value-reader
                 :key-fn keyword))

I’ve written previously about reifying a date masquerading as a string but for now we want to give those matches a name and then process them.

> (def the-matches (read-from-file "data/cl-matches-2013.json"))
> (count the-matches)

I already had a function top-teams which would apply the Elo algorithm across the matches, but since I wanted to drill into each team’s performance I wrapped that function in another one called print-top-teams:

(comment "other functions excluded for brevity")
(defn format-for-printing [all-matches idx [team ranking & [rd]]]
  (let [team-matches (show-matches team all-matches)]
    (merge  {:rank (inc idx)
             :team team
             :ranking ranking
             :rd rd
             :round (performance team-matches)}
            (match-record team-matches))))
(defn print-top-teams
  ([number all-matches] (print-top-teams number all-matches {}))
  ([number all-matches base-rankings]
       [:rank :team :ranking :round :wins :draw :loses]
        (partial format-for-printing all-matches)
        (top-teams number all-matches base-rankings)))))
> (ranking-algorithms.core/print-top-teams 10 the-matches)
:rank | :team       | :ranking | :round         | :wins | :draw | :loses
1     | Bayern      | 1272.74  | Final          | 10    | 1     | 2     
2     | PSG         | 1230.02  | Quarter-finals | 6     | 3     | 1     
3     | Dortmund    | 1220.96  | Final          | 7     | 4     | 2     
4     | Real Madrid | 1220.33  | Semi-finals    | 6     | 3     | 3     
5     | Porto       | 1216.97  | Round of 16    | 5     | 1     | 2     
6     | CFR Cluj    | 1216.56  | Group stage    | 7     | 1     | 2     
7     | Galatasaray | 1215.56  | Quarter-finals | 5     | 2     | 3     
8     | Juventus    | 1214.0   | Quarter-finals | 5     | 3     | 2     
9     | Málaga      | 1211.53  | Quarter-finals | 5     | 5     | 2     
10    | Valencia    | 1211.0   | Round of 16    | 4     | 2     | 2     

I’ve excluded most of the functions but you can find the source in core.clj on github.

Clojure-wise I learnt about the print-table function which came in handy and Elo-wise I realised that the ranking places a heavy emphasis on winning matches.

If you follow the Champions League closely you’ll have noticed that Barcelona are missing from the top 10 despite reaching the Semi Final. The Elo algorithm actually ranks them in 65th position:

:rank | :team               | :ranking | :round                  | :wins | :draw | :loses
63    | Motherwell          | 1195.04  | Third qualifying round  | 0     | 0     | 2     
64    | Feyenoord           | 1195.04  | Third qualifying round  | 0     | 0     | 2     
65    | Barcelona           | 1194.68  | Semi-finals             | 5     | 3     | 4     
66    | BATE                | 1194.36  | Group stage             | 5     | 3     | 4     
67    | Anderlecht          | 1193.41  | Group stage             | 4     | 2     | 4 

It’s all about winning

I thought there might be a bug in my implementation but having looked through it multiple times, Barcelona’s low ranking results from losing multiple games – to Celtic, AC Milan and twice to Bayern Munich – and progressing from their Quarter Final tie against PSG without winning either match.

I did apply a higher weighting to matches won later on in the competition but otherwise the Elo algorithm doesn’t take into account progress in a tournament.

Low variation in rankings

The initial ranking of each team was 1200, so I was surprised to see that the top ranked team had only achieved a ranking of 1272 – I expected it to be higher.

I read a bit more about the algorithm and learnt that a 200 points gap in ranking signifies that the higher ranked team should win 75% of the time.

For this data set the top ranked team has 1272 points and the lowest ranked team has 1171 points so we probably need to tweak the algorithm to make it more accurate.

Accuracy of the Elo algorithm

My understanding of the Elo algorithm is that it becomes more accurate as teams play more matches so I decided to try it out on all the matches from 2004 – 2012.

I adapted the print-top-teams function to exclude ‘:round’ since it doesn’t make sense in this context:

(comment "I really need to pull out the printing stuff into a function but I'm lazy so I haven't...yet")
(defn print-top-teams-without-round
  ([number all-matches] (print-top-teams-without-round number all-matches {}))
  ([number all-matches base-rankings]
       [:rank :team :ranking :wins :draw :loses]
        (partial format-for-printing all-matches)
        (top-teams number all-matches base-rankings)))))

If we evaluate that function we see the following rankings:

> (def the-matches (read-from-file "data/cl-matches-2004-2012.json"))
> (ranking-algorithms.core/print-top-teams-without-round 10 the-matches)
:rank | :team          | :ranking | :wins | :draw | :loses
1     | Barcelona      | 1383.85  | 55    | 25    | 12    
2     | Man. United    | 1343.54  | 49    | 21    | 14    
3     | Chelsea        | 1322.0   | 44    | 27    | 17    
4     | Real Madrid    | 1317.68  | 42    | 14    | 18    
5     | Bayern         | 1306.18  | 42    | 13    | 19    
6     | Arsenal        | 1276.83  | 47    | 21    | 18    
7     | Liverpool      | 1272.52  | 41    | 17    | 17    
8     | Internazionale | 1260.27  | 36    | 18    | 21    
9     | Milan          | 1257.63  | 34    | 22    | 18    
10    | Bordeaux       | 1243.04  | 12    | 3     | 7     

The only finalists missing from this list are Monaco and Porto who contested the final in 2004 but haven’t reached that level of performance since.

Bordeaux are the only unlikely entry in this list and have played ~60 games less than the other teams which suggests that we might not have as much confidence in their ranking. It was at this stage that I started looking at the Glicko algorithm which calculates a rating reliability as well as the rating itself.

I’ve added instructions to the github README showing some examples but if you have any questions feel free to ping me on here or twitter.

Written by Mark Needham

September 30th, 2013 at 8:26 pm

Posted in Ranking Systems

Tagged with ,

Glicko Rating System: A simple example using Clojure

with 2 comments

A couple of weeks ago I wrote about the Elo Rating system and when reading more about it I learnt that one of its weaknesses is that it doesn’t take into account the reliability of a players’ rating.

For example, a player may not have played for a long time. When they next play a match we shouldn’t assume that the accuracy of that rating is the same as for another player with the same rating but who plays regularly.

Mark Glickman wrote the Glicko Rating System to take the uncertainty into account by introducing a ‘ratings deviation’ (RD). A low RD indicates that a player competes frequently and a higher RD indicates that they don’t.

One other difference between Glicko and Elo is the following:

It is interesting to note that, in the Glicko system, rating changes are not balanced as they usually are in the Elo system.

If one player’s rating increases by x, the opponent’s rating does not usually decrease by x as in the Elo system.

In fact, in the Glicko system, the amount by which the opponent’s rating decreases is governed by both players’ RD’s.

The RD value effectively tells us the range in which the player’s actual rating probably exists. i.e. a 95% confidence interval.

e.g. if a player has a rating of 1850 and a RD of 50 then the interval is 1750 – 1950 or (Rating – 2*RD)(Rating + 2*RD)

The algorithm has 2 steps:

  1. Determine a rating and RD for each player at the onset of the rating period. If the player is unrated use a value of 1500 and RD of 350. If they do have a rating we’ll calculate the new RD from the old RD using this formula:
    Glicko rd


    • t is the number of rating periods since last competition (e.g., if the player
      competed in the most recent rating period, t = 1)
    • c is a constant that governs the increase in uncertainty over time.
  2. Update each players rating and RD separately using the following formula:


    • r is the player’s pre-period rating
    • RD is the player’s pre-period ratings deviation
    • r1, r2,…,rm are the pre-period ratings of their opponents
    • RD1, RD2,…,RDm are the pre-period ratings deviations of their opponents
    • s1, s2,…,2m are the scores against the opponents. 1 is a win, 1/2 is a draw, 0 is a defeat.
    • r’ is the player’s post-period rating
    • RD’ is the player’s post-period ratings deviation

The paper provides an example to follow and includes the intermediate workings which made it easier to build the algorithm one function at a time.

The q function was the simplest to implement so I created that and the g function at the same time:

(ns ranking-algorithms.glicko
  (:require [clojure.math.numeric-tower :as math]))
(def q
  (/ (java.lang.Math/log 10) 400))
(defn g [rd]
  (/ 1
     (java.lang.Math/sqrt (+ 1
                             (/ (* 3 (math/expt q 2) (math/expt rd 2))
                                (math/expt ( . Math PI) 2))))))

We can use the following table to check we get the right results when we call it.:

Glicko table

> (g 30)
> (g 100)
> (g 300)

The next easiest function to write was the E function:

(defn e [rating opponent-rating opponent-rd]
  (/ 1
     (+ 1
        (math/expt 10 (/ (* (- (g opponent-rd))
                            (- rating opponent-rating))

And if we test that assuming that we have a rating of 1500 with a RD of 200:

> (e 1500 1400 30)
> (e 1500 1550 100)
> (e 1500 1700 300)

Finally we need to write the d2 supporting function:

(defn d2 [opponents]
  (/ 1  (* (math/expt q 2)
           (reduce process-opponent 0 opponents))))
(defn process-opponent [total opponent]
  (let [{:keys [g e]} opponent]
    (+ total (* (math/expt g 2) e (- 1 e)))))

In this function we need to sum a combination of the g and e values we calculated earlier for each opponent so we can use a reduce over a collection of those values for each opponent to do that:

> (d2 [{:g (g 30) :e (e 1500 1400 30)} 
       {:g (g 100) :e (e 1500 1550 100)} 
       {:g (g 300) :e (e 1500 1700 300)}])

I get a slightly different value for this function which I think is because I didn’t round the intermediate values to 2 decimal places as the example does.

Now we can introduce the r’ function which returns our ranking after taking the matches against these opponents into account:

(defn update-ranking [ranking-delta opponent]
  (let [{:keys [ranking opponent-ranking opponent-ranking-rd score]} opponent]
    (+ ranking-delta
       (* (g opponent-ranking-rd)
          (- score (e ranking opponent-ranking opponent-ranking-rd))))))
(defn g-and-e
  [ranking {o-rd :opponent-ranking-rd o-ranking :opponent-ranking}]
  {:g (g o-rd) :e (e ranking o-ranking o-rd)})
(defn ranking-after-round
  [{ ranking :ranking rd :ranking-rd opponents :opponents}]  
  (+ ranking
     (* (/ q
           (+ (/ 1 (math/expt rd 2))
              (/ 1 (d2 (map (partial g-and-e ranking) opponents)))))
        (reduce update-ranking 0 (map #(assoc-in % [:ranking] ranking) opponents)))))

One thing I wasn’t sure about here was the use of partial which is a bit of a Haskell idiom. I’m not sure what the favoured approach is in Clojure land yet.

If we execute that function we get the expected result:

> (ranking-after-round { :ranking 1500 
                         :ranking-rd 200 
                         :opponents[{:opponent-ranking 1400 :opponent-ranking-rd 30 :score 1} 
                                    {:opponent-ranking 1550 :opponent-ranking-rd 100 :score 0} 
                                    {:opponent-ranking 1700 :opponent-ranking-rd 300 :score 0}]})

The only function missing now is RD’ which returns our RD after taking these matches into account:

(defn rd-after-round
  [{ ranking :ranking rd :ranking-rd opponents :opponents}]
  (java.lang.Math/sqrt (/ 1 (+ (/ 1 (math/expt rd 2)
                               (/ 1 (d2 (map (partial g-and-e ranking) opponents)))))))

If we execute that function we get the expected result and we’re done!

> (rd-after-round { :ranking 1500 
                    :ranking-rd 200 
                    :opponents[{:opponent-ranking 1400 :opponent-ranking-rd 30 :score 1} 
                               {:opponent-ranking 1550 :opponent-ranking-rd 100 :score 0} 
                               {:opponent-ranking 1700 :opponent-ranking-rd 300 :score 0}]})

The next step is to run this algorithm against the football data and see if its results differ to the ones I got with the Elo algorithm.

I’m still not quite sure what I should set the rating period to. My initial thinking was that the rating period could be a season but that would mean that a team’s rating only really makes sense after a few seasons of matches.

The code is on github if you want to play with it and if you have any suggestions on how to make the code more idiomatic I’d love to hear them.

Written by Mark Needham

September 14th, 2013 at 9:02 pm

Elo Rating System: Ranking Champions League teams using Clojure

with 4 comments

As I mentioned in an earlier blog post I’ve been learning about ranking systems and one of the first ones I came across was the Elo rating system which is most famously used to rank chess players.

The Elo rating system uses the following formula to work out a player/team’s ranking after they’ve participated in a match:

R’ = R + K * (S – E)

  • R’ is the new rating
  • R is the old rating
  • K is a maximum value for increase or decrease of rating (16 or 32 for ELO)
  • S is the score for a game
  • E is the expected score for a game

I converted that formula into the following Clojure functions:

(defn ranking-after-win
  [{ ranking :ranking opponent-ranking :opponent-ranking importance :importance}]
  (+ ranking (* importance (- 1 (expected ranking opponent-ranking) ))))
(defn ranking-after-loss
  [{ ranking :ranking opponent-ranking :opponent-ranking importance :importance}]
  (+ ranking (* importance (- 0 (expected ranking opponent-ranking) ))))
(defn expected [my-ranking opponent-ranking]
  (/ 1.0
     (+ 1 (math/expt 10 (/ (- opponent-ranking my-ranking) 400)))))

which would be called like this to work out the new ranking of a team ranked 1200 that beat a team ranked 1500:

> (ranking-after-win { :ranking 1200 :opponent-ranking 1500 :importance 32 })

The way it works is that we first work out the likelihood that we should win the match by calling expected:

> (expected 1200 1500)

This tells us that we have a 15% chance of winning the match so if we do win then our ranking should be increased by a large amount as we aren’t expected to win. In this case a win gives us a points increase of ’32 * (1-0.15)’ which is ~27 points.

I kept things simple by always setting the importance/maximum value of increase or decrease to 32. The World Football Rankings took a different approach where they vary it based on the importance of a match and the margin of victory.

I decided to try out the algorithm on the 2002/2003 Champions League season. I was able to grab the data from The Rec Sport Soccer Statistics Foundation and I’ve written previously about how I scraped it using Enlive.

With a lot of help from Paul Bostrom I ended up with the following code to run a reduce over the matches while updating team rankings after each match:

(defn top-teams [number matches]
  (let [teams-with-rankings
    (apply array-map (mapcat (fn [x] [x {:points 1200}]) (extract-teams matches)))]
      (take number
        (sort-by (fn [x] (:points (val x)))
                 (seq (reduce process-match teams-with-rankings matches))))))
(defn process-match [ts match]
  (let [{:keys [home away home_score away_score]} match]
     (> home_score away_score)
     (-> ts
         (update-in  [home :points]
                     #(ranking-after-win {:ranking % :opponent-ranking (:points (get ts away)) :importance 32}))
         (update-in  [away :points]
                     #(ranking-after-loss {:ranking % :opponent-ranking (:points (get ts home)) :importance 32}))) 
     (> away_score home_score)
     (-> ts
         (update-in  [home :points]
                     #(ranking-after-loss {:ranking % :opponent-ranking (:points  (get ts away)) :importance 32}))
         (update-in  [away :points]
                     #(ranking-after-win {:ranking % :opponent-ranking (:points (get ts home)) :importance 32})))
     (= home_score away_score) ts)))

The matches parameter that we pass into top-teams looks like this:

> (take 5 all-matches)
({:home "Tampere", :away "Pyunik Erewan", :home_score 0, :away_score 4} {:home "Pyunik Erewan", :away "Tampere", :home_score 2, :away_score 0} {:home "Skonto Riga", :away "Barry Town", :home_score 5, :away_score 0} {:home "Barry Town", :away "Skonto Riga", :home_score 0, :away_score 1} {:home "Portadown", :away "Belshina Bobruisk", :home_score 0, :away_score 0})

And calling extract-teams on it gets us a set of all the teams involved:

> (extract-teams (take 5 all-matches))
#{"Portadown" "Tampere" "Pyunik Erewan" "Barry Town" "Skonto Riga"}

We then mapcat over it to get a vector containing team/default points pairs:

> (mapcat (fn [x] [x {:points 1200}]) (extract-teams (take 5 all-matches)))
("Portadown" {:points 1200} "Tampere" {:points 1200} "Pyunik Erewan" {:points 1200} "Barry Town" {:points 1200} "Skonto Riga" {:points 1200})

before calling array-map to make a hash of the result:

> (apply array-map (mapcat (fn [x] [x {:points 1200}]) (extract-teams (take 5 all-matches))))
{"Portadown" {:points 1200}, "Tampere" {:points 1200}, "Pyunik Erewan" {:points 1200}, "Barry Town" {:points 1200}, "Skonto Riga" {:points 1200}}

We then apply a reduce over all the matches and call the function process-match on each iteration to update team rankings appropriately. The final step is to sort the teams by their ranking so we can list the top teams:

> (top-teams 10 all-matches)
(["CF Barcelona" {:points 1343.900393287903}] 
 ["Manchester United" {:points 1292.4731214788262}] 
 ["FC Valencia" {:points 1277.1820905112208}] 
 ["Internazionale Milaan" {:points 1269.8028023141364}] 
 ["AC Milan" {:points 1257.4564374787687}]
 ["Juventus Turijn" {:points 1254.2498432522466}] 
 ["Real Madrid" {:points 1248.0758162475993}] 
 ["Deportivo La Coruna" {:points 1235.7792317210403}] 
 ["Borussia Dortmund" {:points 1231.1671952364256}] 
 ["Sparta Praag" {:points 1229.3249513256828}])

Interestingly the winners (Juventus) are only in 5th place and the top 2 places are occupied by teams that lost in the Quarter Final. I wrote the following functions to investigate what was going on:

(defn show-matches [team matches]
  (->> matches
       (filter #(or (= team (:home %)) (= team (:away %))))
       (map #(show-opposition team %))))
(defn show-opposition [team match]
  (if (= team (:home match))
    {:opposition (:away match) :score (str (:home_score match) "-" (:away_score match))}
    {:opposition (:home match) :score (str (:away_score match) "-" (:home_score match))}))

If we call it with Juventus we can see how they performed in their matches:

ranking-algorithms.parse> (show-matches "Juventus Turijn" all-matches)
({:opposition "Feyenoord", :score "1-1"} 
 {:opposition "Dynamo Kiev", :score "5-0"} 
 {:opposition "Newcastle United", :score "2-0"} 
 {:opposition "Newcastle United", :score "0-1"} 
 {:opposition "Feyenoord", :score "2-0"} 
 {:opposition "Dynamo Kiev", :score "2-1"} 
 {:opposition "Deportivo La Coruna", :score "2-2"} 
 {:opposition "FC Basel", :score "4-0"} 
 {:opposition "Manchester United", :score "1-2"} 
 {:opposition "Manchester United", :score "0-3"} 
 {:opposition "Deportivo La Coruna", :score "3-2"} 
 {:opposition "FC Basel", :score "1-2"} 
 {:opposition "CF Barcelona", :score "1-1"} 
 {:opposition "CF Barcelona", :score "2-1"} 
 {:opposition "Real Madrid", :score "1-2"} 
 {:opposition "Real Madrid", :score "3-1"})

Although I’m missing the final – I need to fix the parser to pick that match up and it was a draw anyway – they actually only won 8 of their matches outright. Barcelona, on the other hand, won 13 matches although 2 of those were qualifiers.

The next step is to take into account the importance of the match rather than applying an importance of 32 across the board and adding some value to winning a tie/match even if it’s on penalties or away goals.

The code is on github if you want to play around with it or have suggestions for something else I can try out.

Written by Mark Needham

August 31st, 2013 at 1:01 pm

Ranking Systems: What I’ve learnt so far

with 5 comments

I often go off on massive tangents reading all about a new topic but don’t record what I’ve read so if I go back to the topic again in the future I have to start from scratch which is quite frustrating.

In this instance after playing around with calculating the eigenvector centrality of a sub graph I learnt that this algorithm can also be used in ranking systems.

I started off by reading a paper written by James Keener about the Perron-Frobenius Theorem and the ranking of American football teams.

The Perron-Frobenius Theorem asserts the following:

a real square matrix with positive entries has a unique largest real eigenvalue and that the corresponding eigenvector has strictly positive components

This is applicable for network based ranking systems as we can build up a matrix of teams, store a value representing their performance against each other, and then calculate an ordered ranking based on eigenvector centrality.

I also came across the following articles describing different network-based approaches to ranking teams/players in tennis and basketball respectively:

Unfortunately I haven’t come across any corresponding code showing how to implement those algorithms so I need to do a bit more reading and figure out how to do it.

In the world of non network based ranking systems I came across 3 algorithms:

  • Elo – this is a method originally developed to calculate the relative skill of chess players.

    Players start out with an average rating which then increases/decreases based on games they take part in. If they beat someone much more highly ranked then they’d gain a lot of points whereas losing to someone similarly ranked wouldn’t affect their ranking too much.

    I came across a version used to rank country football teams. and the algorithm is quite well described in Christopher Allen’s article on competitive ranking systems.

  • Glicko – this method was developed as the author, Mark Glickman, detected some flaws in the Elo rating system around the reliability of players’ ratings.

    This algorithm therefore introduces the concept of a ratings deviation (RD) to measure uncertainty in a rating. If a player player plays regularly they’d have a low RD and if they don’t it’d be higher. This is then taken into account when assigning points based on games between different players.

    Rob Kohr has an implementation of this one using Javascript on his github.

  • TrueSkill – this one was developed by Microsoft Research to rank players using XBox Live. This seems similar to Glicko in that it has a rating and uncertainty for each player. TrueSkill’s FAQs suggest the following difference between the two:

    Glicko was developed as an extension of ELO and was thus naturally limited to two player matches which end in either win or loss. Glicko cannot update skill levels of players if they compete in multi-player events or even in teams. The logistic model would make it computationally expensive to deal with team and multi-player games. Moreover, chess is usually played in pre-set tournaments and thus matching the right opponents was not considered a relevant problem in Glicko. In contrast, the TrueSkill ranking system offers a way to measure the quality of a match between any set of players.

Scott Hamilton has an implementation of all these algorithms in Python which I need to play around with. He based his algorithms on a blog post written by Jeff Moser in which he explains probabilities, the Gaussian distribution, Bayesian probability and factor graphs in deciphering the TrueSkill algorithm. Moser’s created a project implementing TrueSkill in C# on github.

I follow tennis and football reasonably closely so I thought I’d do a bit of reading about the main two rankings I know about there as well:

  • UEFA club coefficients – used to rank football clubs that have taken part in a European competition over the last 5 seasons. It takes into account the importance of the match but not the strength of the opposition
  • ATP Tennis Rankings – used to rank tennis players on a rolling basis over the last 12 months. They take into account the importance of a tournament and the round a player reached to assign ranking points.

Now that I’ve recorded all that it’s time to go and play with some of them!

Written by Mark Needham

August 24th, 2013 at 11:05 am