Mark Needham

Thoughts on Software Development

Strava: Calculating the similarity of two runs

I go running several times a week and wanted to compare my runs against each other to see how similar they are.

I record my runs with the Strava app and it has an API that returns lat/long coordinates for each run in the Google encoded polyline algorithm format.

We can use the polyline library to decode these values into a list of lat/long tuples. For example:

```import polyline polyline.decode('u{~vFvyys@fS]') [(40.63179, -8.65708), (40.62855, -8.65693)]```

Once we’ve got the route defined as a set of coordinates we need to compare them. My Googling led me to an algorithm called Dynamic Time Warping

DTW is a method that calculates an optimal match between two given sequences (e.g. time series) with certain restrictions.

The sequences are “warped” non-linearly in the time dimension to determine a measure of their similarity independent of certain non-linear variations in the time dimension.

The fastdtw library implements an approximation of this library and returns a value indicating the distance between sets of points.

We can see how to apply fastdtw and polyline against Strava data in the following example:

```import os import polyline import requests from fastdtw import fastdtw   token = os.environ["TOKEN"] headers = {'Authorization': "Bearer {0}".format(token)}   def find_points(activity_id): r = requests.get("https://www.strava.com/api/v3/activities/{0}".format(activity_id), headers=headers) response = r.json() line = response["map"]["polyline"] return polyline.decode(line)```

Now let’s try it out on two runs, 1361109741 and 1346460542:

```from scipy.spatial.distance import euclidean   activity1_id = 1361109741 activity2_id = 1346460542   distance, path = fastdtw(find_points(activity1_id), find_points(activity2_id), dist=euclidean)   >>> print(distance) 2.91985018100644```

These two runs are both near my house so the value is small. Let’s change the second route to be from my trip to New York:

```activity1_id = 1361109741 activity2_id = 1246017379   distance, path = fastdtw(find_points(activity1_id), find_points(activity2_id), dist=euclidean)   >>> print(distance) 29383.492965394034```

Much bigger!

I’m not really interested in the actual value returned but I am interested in the relative values. I’m building a little application to generate routes that I should run and I want it to come up with a routes that are different to recent ones that I’ve run. This score can now form part of the criteria.

Written by Mark Needham

January 18th, 2018 at 11:35 pm

with one comment

In my last post I showed how to map Strava runs using data that I’d extracted from their /activities API, but the API returns a lot of other data that I discarded because I wasn’t sure what I should keep.

The API returns a nested JSON structure so the easiest solution would be to save each run as an individual file but I’ve always wanted to try out PostgreSQL’s JSON data type and this seemed like a good opportunity.

Creating a JSON ready PostgreSQL table

First up we need to create a database in which we’ll store our Strava data. Let’s name it appropriately:

```CREATE DATABASE strava; \CONNECT strava;```

Now we can now create a table with one field with the JSON data type:

```CREATE TABLE runs ( id INTEGER NOT NULL, DATA jsonb );   ALTER TABLE runs ADD PRIMARY KEY(id);```

Easy enough. Now we’re ready to populate the table.

Importing Strava API

We can partially reuse the script from the last post except rather than saving to CSV file we’ll save to PostgreSQL using the psycopg2 library.

The script relies on a TOKEN environment variable. If you want to try this on your own Strava account you’ll need to create an application, which will give you a key.

extract-runs.py

```import requests import os import json import psycopg2   token = os.environ["TOKEN"] headers = {'Authorization': "Bearer {0}".format(token)}   with psycopg2.connect("dbname=strava user=markneedham") as conn: with conn.cursor() as cur: page = 1 while True: r = requests.get("https://www.strava.com/api/v3/athlete/activities?page={0}".format(page), headers = headers) response = r.json()   if len(response) == 0: break else: for activity in response: r = requests.get("https://www.strava.com/api/v3/activities/{0}?include_all_efforts=true".format(activity["id"]), headers = headers) json_response = r.json() cur.execute("INSERT INTO runs (id, data) VALUES(%s, %s)", (activity["id"], json.dumps(json_response))) conn.commit() page += 1```

Querying Strava

We can now write some queries against our newly imported data.

My quickest runs

```SELECT id, data->>'start_date' AS start_date, (data->>'average_speed')::FLOAT AS speed FROM runs ORDER BY speed DESC LIMIT 5   id | start_date | speed -----------+----------------------+------- 649253963 | 2016-07-22T05:18:37Z | 3.736 914796614 | 2017-03-26T08:37:56Z | 3.614 653703601 | 2016-07-26T05:25:07Z | 3.606 548540883 | 2016-04-17T18:18:05Z | 3.604 665006485 | 2016-08-05T04:11:21Z | 3.604 (5 ROWS)```

My longest runs

```SELECT id, data->>'start_date' AS start_date, (data->>'distance')::FLOAT AS distance FROM runs ORDER BY distance DESC LIMIT 5   id | start_date | distance -----------+----------------------+---------- 840246999 | 2017-01-22T10:20:33Z | 10764.1 461124609 | 2016-01-02T08:42:47Z | 10457.9 467634177 | 2016-01-10T18:48:47Z | 10434.5 471467618 | 2016-01-16T12:33:28Z | 10359.3 540811705 | 2016-04-10T07:26:55Z | 9651.6 (5 ROWS)```

Runs this year

```SELECT COUNT(*) FROM runs WHERE data->>'start_date' >= '2017-01-01 00:00:00'   COUNT ------- 62 (1 ROW)```

Runs per year

```SELECT EXTRACT(YEAR FROM to_date(data->>'start_date', 'YYYY-mm-dd')) AS YEAR, COUNT(*) FROM runs GROUP BY YEAR ORDER BY YEAR   YEAR | COUNT ------+------- 2014 | 18 2015 | 139 2016 | 166 2017 | 62 (4 ROWS)```

That’s all for now. Next I’m going to learn how to query segments, which are stored inside a nested array inside the JSON document. Stay tuned for that in a future post.

Written by Mark Needham

May 1st, 2017 at 7:11 pm

Posted in PostgreSQL

Tagged with , ,

Leaflet: Mapping Strava runs/polylines on Open Street Map

with one comment

I’m a big Strava user and spent a bit of time last weekend playing around with their API to work out how to map all my runs.

Strava API and polylines

This is a two step process:

1. Call the /athlete/activities/ endpoint to get a list of all my activities
2. For each of those activities call /activities/[activityId] endpoint to get more detailed information for each activity

That second API returns a ‘polyline’ property which the documentation describes as follows:

Activity and segment API requests may include summary polylines of their respective routes. The values are string encodings of the latitude and longitude points using the Google encoded polyline algorithm format.

If we navigate to that page we get the following explanation:

Polyline encoding is a lossy compression algorithm that allows you to store a series of coordinates as a single string.

I tried out a couple of my polylines using the interactive polyline encoder utility which worked well once I realised that I needed to escape backslashes (“\”) in the polyline before pasting it into the tool.

Now that I’d figured out how to map one run it was time to automate the process.

Leaflet and OpenStreetMap

I’ve previously had a good experience using Leaflet so I was keen to use that and luckily came across a Stack Overflow answer showing how to do what I wanted.

I created a HTML file and manually pasted in a couple of my runs (not forgetting to escape those backslashes!) to check that they worked:

blog.html

```<html> <head> <title>Mapping my runs</title> </head>   <body> <script src="http://cdn.leafletjs.com/leaflet-0.7/leaflet.js"></script> <script type="text/javascript" src="https://rawgit.com/jieter/Leaflet.encoded/master/Polyline.encoded.js"></script> <link rel="stylesheet" href="http://cdn.leafletjs.com/leaflet-0.7/leaflet.css" /> <div id="map" style="width: 100%; height: 100%"></div>   <script> var map = L.map('map').setView([55.609818, 13.003286], 13); L.tileLayer( 'http://{s}.tile.openstreetmap.org/{z}/{x}/{y}.png', { maxZoom: 18, }).addTo(map);   var encodedRoutes = [ "{zkrIm`inANPD?BDXGPKLATHNRBRFtAR~AFjAHl@D|ALtATj@HHJBL?`@EZ?NQ\\Y^MZURGJKR]RMXYh@QdAWf@[~@aAFGb@?j@YJKBU@m@FKZ[NSPKTCRJD?`@Wf@Wb@g@HCp@Qh@]z@SRMRE^EHJZnDHbBGPHb@NfBTxBN|DVbCBdA^lBFl@Lz@HbBDl@Lr@Bb@ApCAp@Ez@g@bEMl@g@`B_AvAq@l@ QF]Rs@Nq@CmAVKCK?_@Nw@h@UJIHOZa@xA]~@UfASn@U`@_@~@[d@Sn@s@rAs@dAGN?NVhAB\\Ox@@b@S|A?Tl@jBZpAt@vBJhATfGJn@b@fARp@H^Hx@ARGNSTIFWHe@AGBOTAP@^\\zBMpACjEWlEIrCKl@i@nAk@}@}@yBOWSg@kAgBUk@Mu@[mC?QLIEUAuAS_E?uCKyCA{BH{DDgF`AaEr@uAb@oA~@{AE}AKw@ g@qAU[_@w@[gAYm@]qAEa@FOXg@JGJ@j@o@bAy@NW?Qe@oCCc@SaBEOIIEQGaAe@kC_@{De@cE?KD[H[P]NcAJ_@DGd@Gh@UHI@Ua@}Bg@yBa@uDSo@i@UIICQUkCi@sCKe@]aAa@oBG{@G[CMOIKMQe@IIM@KB]Tg@Nw@^QL]NMPMn@@\\Lb@P~@XT", "u}krIq_inA_@y@My@Yu@OqAUsA]mAQc@CS@o@FSHSp@e@n@Wl@]ZCFEBK?OC_@Qw@?m@CSK[]]EMBeAA_@m@qEAg@UoCAaAMs@IkBMoACq@SwAGOYa@IYIyA_@kEMkC]{DEaAScC@yEHkGA_ALsCBiA@mCD{CCuAZcANOH@HDZl@Z`@RFh@\\TDT@ZVJBPMVGLM\\Mz@c@NCPMXERO|@a@^Ut@s@p@KJAJ Bd@EHEXi@f@a@\\g@b@[HUD_B@uADg@DQLCLD~@l@`@J^TF?JANQ\\UbAyABEZIFG`@o@RAJEl@_@ZENDDIA[Ki@BURQZaARODKVs@LSdAiAz@G`BU^A^GT@PRp@zARXRn@`BlDHt@ZlAFh@^`BX|@HHHEf@i@FAHHp@bBd@v@DRAVMl@i@v@SROXm@tBILOTOLs@NON_@t@KX]h@Un@k@\\c@h@Ud@]ZGNKp@Sj@KJo@ b@W`@UPOX]XWd@UF]b@WPOAIBSf@QVi@j@_@V[b@Uj@YtAEFCCELARBn@`@lBjAzD^vB^hB?LENURkAv@[Ze@Xg@Py@p@QHONMA[HGAWE_@Em@Hg@AMCG@QHq@Cm@M[Jy@?UJIA{@Ae@KI@GFKNIX[QGAcAT[JK?OVMFK@IAIUKAYJI?QKUCGFIZCXDtAHl@@p@LjBCZS^ERAn@Fj@Br@Hn@HzAHh@RfD?j@TnCTlA NjANb@\\z@TtARr@P`AFnAGfBG`@CFE?" ]   for (let encoded of encodedRoutes) { var coordinates = L.Polyline.fromEncoded(encoded).getLatLngs();   L.polyline( coordinates, { color: 'blue', weight: 2, opacity: .7, lineJoin: 'round' } ).addTo(map); } </script> </body> </html>```

We can spin up a Python web server over that HTML file to see how it renders:

```\$ python -m http.server Serving HTTP on 0.0.0.0 port 8000 (http://0.0.0.0:8000/) ...```

And below we can see both runs plotted on the map.

Automating Strava API to Open Street Map

The final step is to automate the whole thing so that I can see all of my runs.

I wrote the following script to call the Strava API and save the polyline for every run to a CSV file:

```import requests import os import sys import csv   token = os.environ["TOKEN"] headers = {'Authorization': "Bearer {0}".format(token)}   with open("runs.csv", "w") as runs_file: writer = csv.writer(runs_file, delimiter=",") writer.writerow(["id", "polyline"])   page = 1 while True: r = requests.get("https://www.strava.com/api/v3/athlete/activities?page={0}".format(page), headers = headers) response = r.json()   if len(response) == 0: break else: for activity in response: r = requests.get("https://www.strava.com/api/v3/activities/{0}?include_all_efforts=true".format(activity["id"]), headers = headers) polyline = r.json()["map"]["polyline"] writer.writerow([activity["id"], polyline]) page += 1```

I then wrote a simple script using Flask to parse the CSV files and send a JSON representation of my runs to a slightly modified version of the HTML page that I described above:

```from flask import Flask from flask import render_template import csv import json   app = Flask(__name__)   @app.route('/') def my_runs(): runs = [] with open("runs.csv", "r") as runs_file: reader = csv.DictReader(runs_file)   for row in reader: runs.append(row["polyline"])   return render_template("leaflet.html", runs = json.dumps(runs))   if __name__ == "__main__": app.run(port = 5001)```

I changed the following line in the HTML file:

`var encodedRoutes = {{ runs|safe }};`

Now we can launch our Flask web server:

```\$ python app.py * Running on http://127.0.0.1:5001/ (Press CTRL+C to quit)```

And if we navigate to http://127.0.0.1:5001/ we can see all my runs that went near Westminster:

The full code for all the files I’ve described in this post are available on github. If you give it a try you’ll need to provide your Strava Token in the ‘TOKEN’ environment variable before running extract_runs.py.