Pandas: Compare values in DataFrame to previous days
I’m still playing around with Covid vaccine data, this time exploring how the number of doses varies week by week. I want to know how many more (or less) vaccines have been done on a given day compared to that same day last week.
We’ll be using Pandas in this blog post, so let’s first install that library and import it:
pip install pandas
import pandas as pd
And now let’s create a DataFrame containing a subset of the data that I’m working with:
df = pd.DataFrame([
{'date': '20210111', 'vaccinations': 165844.0},
{'date': '20210118', 'vaccinations': 208641.0},
{'date': '20210125', 'vaccinations': 281725.0},
{'date': '20210201', 'vaccinations': 352935.0},
{'date': '20210208', 'vaccinations': 356291.0}
])
date  vaccinations 

20210111 
165844.0 
20210118 
208641.0 
20210125 
281725.0 
20210201 
352935.0 
20210208 
356291.0 
I’ve filtered the data to include the data for 5 Mondays start from the beginning of January. In the real data set we have the data for every day from January until today.
I want to add a column that shows the number of vaccinations done on the previous week. So:

for
20210118
, we’d have the value for20210111

for
20210125
, we’d have the value for20210218
and so on.
It took me a while to find the function to do this, but it turns out that pandas.Series.shift
is what we want.
This function shifts the index by the desired number of periods.
We’ll pass in periods=1
because we want to shift every row down by 1:
df["vaccinations"].shift(periods=1)
Note

I should note that for the real DataFrame I had data for every day and therefore wanted to compare the data from 7 rows earlier in the DataFrame,which meant I had to use 
vaccinations 

NaN 
165844.0 
208641.0 
281725.0 
352935.0 
We can then add a new column to our DataFrame using the following code:
df.loc[:, "lastWeekVaccinations"] = df["vaccinations"].shift(periods=1)
date  vaccinations  lastWeekVaccinations 

20210111 
165844.0 
NaN 
20210118 
208641.0 
165844.0 
20210125 
281725.0 
208641.0 
20210201 
352935.0 
281725.0 
20210208 
356291.0 
352935.0 
If we then wanted to compute the difference between this week and last week, we could compute this by writing the following code:
df["vaccinations"]  df["lastWeekVaccinations"]
Or we could use the pandas.Series.diff
function, which achieves the same thing:
df.loc[:, "lastWeekVaccinationsDiff"] = df["vaccinations"].diff(periods=1)
date  vaccinations  lastWeekVaccinations  lastWeekVaccinationsDiff 

20210111 
165844.0 
NaN 
NaN 
20210118 
208641.0 
165844.0 
42797.0 
20210125 
281725.0 
208641.0 
73084.0 
20210201 
352935.0 
281725.0 
71210.0 
20210208 
356291.0 
352935.0 
3356.0 
We can also compute the percentage change between the weeks using pandas.Series.pct_change
:
df.loc[:, "lastWeekVaccinationsChange"] = df["vaccinations"].pct_change(periods=1)
date  vaccinations  lastWeekVaccinations  lastWeekVaccinationsDiff  lastWeekVaccinationsChange 

20210111 
165844.0 
NaN 
NaN 
NaN 
20210118 
208641.0 
165844.0 
42797.0 
0.2580557632473892 
20210125 
281725.0 
208641.0 
73084.0 
0.3502858977861494 
20210201 
352935.0 
281725.0 
71210.0 
0.25276422042772206 
20210208 
356291.0 
352935.0 
3356.0 
0.009508833071245393 
With these functions I’ve been able to do exactly what I wanted and could then create a chart based on this data.
About the author
I'm currently working on realtime userfacing analytics with Apache Pinot at StarTree. I publish short 5 minute videos showing how to solve data problems on YouTube @LearnDataWithMark. I previously worked on graph analytics at Neo4j, where I also coauthored the O'Reilly Graph Algorithms Book with Amy Hodler.