· python pandas altair

Pandas: Compare values in DataFrame to previous days

I’m still playing around with Covid vaccine data, this time exploring how the number of doses varies week by week. I want to know how many more (or less) vaccines have been done on a given day compared to that same day last week.

We’ll be using Pandas in this blog post, so let’s first install that library and import it:

Install Pandas
pip install pandas
Import module
import pandas as pd

And now let’s create a DataFrame containing a subset of the data that I’m working with:

df = pd.DataFrame([
    {'date': '2021-01-11', 'vaccinations': 165844.0},
    {'date': '2021-01-18', 'vaccinations': 208641.0},
    {'date': '2021-01-25', 'vaccinations': 281725.0},
    {'date': '2021-02-01', 'vaccinations': 352935.0},
    {'date': '2021-02-08', 'vaccinations': 356291.0}
])
Table 1. Output
date vaccinations

2021-01-11

165844.0

2021-01-18

208641.0

2021-01-25

281725.0

2021-02-01

352935.0

2021-02-08

356291.0

I’ve filtered the data to include the data for 5 Mondays start from the beginning of January. In the real data set we have the data for every day from January until today.

I want to add a column that shows the number of vaccinations done on the previous week. So:

  • for 2021-01-18, we’d have the value for 2021-01-11

  • for 2021-01-25, we’d have the value for 2021-02-18

and so on.

It took me a while to find the function to do this, but it turns out that pandas.Series.shift is what we want. This function shifts the index by the desired number of periods. We’ll pass in periods=1 because we want to shift every row down by 1:

df["vaccinations"].shift(periods=1)
Note

I should note that for the real DataFrame I had data for every day and therefore wanted to compare the data from 7 rows earlier in the DataFrame,which meant I had to use periods=7. If we wanted to compare the data from 3 days earlier, we could use periods=3, and so on.

Table 2. Output
vaccinations

NaN

165844.0

208641.0

281725.0

352935.0

We can then add a new column to our DataFrame using the following code:

df.loc[:, "lastWeekVaccinations"] = df["vaccinations"].shift(periods=1)
Table 3. Output
date vaccinations lastWeekVaccinations

2021-01-11

165844.0

NaN

2021-01-18

208641.0

165844.0

2021-01-25

281725.0

208641.0

2021-02-01

352935.0

281725.0

2021-02-08

356291.0

352935.0

If we then wanted to compute the difference between this week and last week, we could compute this by writing the following code:

df["vaccinations"] - df["lastWeekVaccinations"]

Or we could use the pandas.Series.diff function, which achieves the same thing:

df.loc[:, "lastWeekVaccinationsDiff"] = df["vaccinations"].diff(periods=1)
Table 4. Output
date vaccinations lastWeekVaccinations lastWeekVaccinationsDiff

2021-01-11

165844.0

NaN

NaN

2021-01-18

208641.0

165844.0

42797.0

2021-01-25

281725.0

208641.0

73084.0

2021-02-01

352935.0

281725.0

71210.0

2021-02-08

356291.0

352935.0

3356.0

We can also compute the percentage change between the weeks using pandas.Series.pct_change:

df.loc[:, "lastWeekVaccinationsChange"] = df["vaccinations"].pct_change(periods=1)
Table 5. Output
date vaccinations lastWeekVaccinations lastWeekVaccinationsDiff lastWeekVaccinationsChange

2021-01-11

165844.0

NaN

NaN

NaN

2021-01-18

208641.0

165844.0

42797.0

0.2580557632473892

2021-01-25

281725.0

208641.0

73084.0

0.3502858977861494

2021-02-01

352935.0

281725.0

71210.0

0.25276422042772206

2021-02-08

356291.0

352935.0

3356.0

0.009508833071245393

With these functions I’ve been able to do exactly what I wanted and could then create a chart based on this data.

  • LinkedIn
  • Tumblr
  • Reddit
  • Google+
  • Pinterest
  • Pocket