· python sortedcontainers

Python: Sorting lists of dictionaries with sortedcontainers

I was recently working on a Kafka streams data generator, where I only wanted to publish events once the time on those events had been reached. To solve this problem I needed a sorted list and in this blog post we’re going to explore how I went about doing this.

Note

I’ve created a video showing how to do this on my YouTube channel, Learn Data with Mark, so if you prefer to consume content through that medium, I’ve embedded it below:

Unsorted lists

We’re going to start out by created a list of dictionaries that contain a timestamp and name, as shown below:

import datetime as dt

now = dt.datetime.now()
items = [
  {"timestamp": now, "name": "Mark"},
  {"timestamp": now + dt.timedelta(hours=1), "name": "Dunith"},
  {"timestamp": now - dt.timedelta(minutes=27), "name": "Michael"}
]

If we want to print out this list, we could do so using a for loop:

for item in items:
    print(item)

We’d see the following output:

Output
{'timestamp': datetime.datetime(2022, 11, 24, 7, 36, 31, 757235), 'name': 'Mark'}
{'timestamp': datetime.datetime(2022, 11, 24, 8, 36, 31, 757235), 'name': 'Dunith'}
{'timestamp': datetime.datetime(2022, 11, 24, 7, 9, 31, 757235), 'name': 'Michael'}

As you’d probably expect, this is sorted in insertion order.

If we want to sort the results we could wrap items in the sorted function, like this:

for item in sorted(items, key=lambda x: x["timestamp"]):
    print(item)

In this code sample we’re sorting by the timestamp property, and if we run this code, we’ll see the following output:

Output
{'timestamp': datetime.datetime(2022, 11, 24, 7, 9, 31, 757235), 'name': 'Michael'}
{'timestamp': datetime.datetime(2022, 11, 24, 7, 36, 31, 757235), 'name': 'Mark'}
{'timestamp': datetime.datetime(2022, 11, 24, 8, 36, 31, 757235), 'name': 'Dunith'}

This time Michael comes first, then Mark, and finally Dunith.

When you want to do some adhoc sorting, this is a reasonable approach, but I wanted a list that’s always sorted.

sortedcontainers

Enter the sortedcontainers library, a collections library written in Python that’s super fast and supports sorted lists, maps, and dictionaries.

We’re only going to use the list, so let’s import that:

from sortedcontainers import SortedList

When we create an instance of SortedList we need to pass in a lamba expression that describes our sorting key:

sorted_items = SortedList(key=lambda x: x["timestamp"])

We can have it sort an existing list using the update function:

sorted_items.update(items)

Let’s print that list:

for item in sorted_items:
    print(item)
Output
{'timestamp': datetime.datetime(2022, 11, 24, 7, 9, 31, 757235), 'name': 'Michael'}
{'timestamp': datetime.datetime(2022, 11, 24, 7, 36, 31, 757235), 'name': 'Mark'}
{'timestamp': datetime.datetime(2022, 11, 24, 8, 36, 31, 757235), 'name': 'Dunith'}

So far so good. We can also add items to the list:

sorted_items.add(
    {"timestamp": now + dt.timedelta(minutes=12), "name": "Jennifer"}
)

for item in sorted_items:
    print(item)
Output
{'timestamp': datetime.datetime(2022, 11, 24, 7, 9, 31, 757235), 'name': 'Michael'}
{'timestamp': datetime.datetime(2022, 11, 24, 7, 36, 31, 757235), 'name': 'Mark'}
{'timestamp': datetime.datetime(2022, 11, 24, 7, 48, 31, 757235), 'name': 'Jennifer'}
{'timestamp': datetime.datetime(2022, 11, 24, 8, 36, 31, 757235), 'name': 'Dunith'}

And as we’d expect, the Jennifer list item gets sorted between Mark and Dunith.

The classic list operations are sorted as well.

Checking an item exists
{"timestamp": now, "name": "Mark"} in sorted_items
Output
True
Concatenating lists
for item in (sorted_items + sorted_items):
    print(item)
Output
{'timestamp': datetime.datetime(2022, 11, 24, 7, 9, 31, 757235), 'name': 'Michael'}
{'timestamp': datetime.datetime(2022, 11, 24, 7, 9, 31, 757235), 'name': 'Michael'}
{'timestamp': datetime.datetime(2022, 11, 24, 7, 36, 31, 757235), 'name': 'Mark'}
{'timestamp': datetime.datetime(2022, 11, 24, 7, 36, 31, 757235), 'name': 'Mark'}
{'timestamp': datetime.datetime(2022, 11, 24, 7, 48, 31, 757235), 'name': 'Jennifer'}
{'timestamp': datetime.datetime(2022, 11, 24, 7, 48, 31, 757235), 'name': 'Jennifer'}
{'timestamp': datetime.datetime(2022, 11, 24, 8, 36, 31, 757235), 'name': 'Dunith'}
{'timestamp': datetime.datetime(2022, 11, 24, 8, 36, 31, 757235), 'name': 'Dunith'}
Deleting items
sorted_items.remove({"timestamp": now, "name": "Mark"})

for item in sorted_items:
    print(item)
Output
{'timestamp': datetime.datetime(2022, 11, 24, 7, 9, 31, 757235), 'name': 'Michael'}
{'timestamp': datetime.datetime(2022, 11, 24, 7, 48, 31, 757235), 'name': 'Jennifer'}
{'timestamp': datetime.datetime(2022, 11, 24, 8, 36, 31, 757235), 'name': 'Dunith'}

In Summary

So far my experience with the sortedcontainers library is great - it’s done everything I expected and is really easy to use. Give it a try!

  • LinkedIn
  • Tumblr
  • Reddit
  • Google+
  • Pinterest
  • Pocket