· python pandas covid-vaccines

Pandas: Filter column value in array/list - ValueError: The truth value of a Series is ambiguous

The UK government publishes Coronavirus vaccinations data on coronavirus.data.gov.uk, but I wanted to create some different visualisations so I downloaded the data and have been playing with it in the mneedham/covid-vaccines GitHub repository.

I massaged the data so that I have rows in a Pandas DataFrame representing the numbers of first doses, second doses, and total doses done each day. I then wanted to filter this DataFrame based on the type of dose, but initially got a bit stuck. In this blog post we’ll explore how I got around this problem.

The DataFrame that we’re working with looks like the one below:

df = pd.DataFrame({
    "date": ["2021-03-25", "2021-03-25", "2021-03-25"],
    "dose": ["firstDose", "secondDose", "totalDoses"],
    "vaccinations": [324942, 234382, 559324]
})
Table 1. Results
date dose vaccinations

0

2021-03-25

firstDose

324942

1

2021-03-25

secondDose

234382

2

2021-03-25

totalDoses

559324

I wanted to filter the DataFrame to return only the rows with a dose value of firstDose or secondDose.

One way of doing this would be to combine filters for firstDose or secondDose. I initially tried to do that using the Python or clause:

df[(df.dose == "firstDose") or (df.dose == "secondDose")]

But that throws the following error:

Error
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/markhneedham/.local/share/virtualenvs/covid-vaccines-xEbcGJTy/lib/python3.8/site-packages/pandas/core/generic.py", line 1442, in __nonzero__
    raise ValueError(
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Instead we have to use the | operator, as shown below:

df[(df.dose == "firstDose") | (df.dose == "secondDose")]
Table 2. Results
date dose vaccinations

0

2021-03-25

firstDose

324942

1

2021-03-25

secondDose

234382

That works fine if we’re filtering on two values, but if we wanted to add a 3rd or 4th filter, our expression would get more complicated. Instead it would be cool if we could filter on a list of values. I initially tried to do this using the following code:

df[df.dose in ["firstDose", "secondDose"]]

But that takes us back to the same error that we had before:

Error
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/markhneedham/.local/share/virtualenvs/covid-vaccines-xEbcGJTy/lib/python3.8/site-packages/pandas/core/generic.py", line 1442, in __nonzero__
    raise ValueError(
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

This time we can use Pandas' isin function to solve the problem, as shown below:

df[df.dose.isin(["firstDose", "secondDose"])]
Table 3. Results
date dose vaccinations

0

2021-03-25

firstDose

324942

1

2021-03-25

secondDose

234382

  • LinkedIn
  • Tumblr
  • Reddit
  • Google+
  • Pinterest
  • Pocket