· python pandas covid-vaccines video

Pandas - Format DataFrame numbers with commas and control decimal places

I’m still playing around with the UK’s COVID-19 vaccination data and in this blog post we’ll learn how to format a DataFrame that contains a mix of string and numeric values.

Note

On 10th November 2022 I created a video that covers the same content as this blog post. Let me know if it’s helpful 😊

We’ll be using Pandas' styling functionality, which generates CSS and HTML, so if you want to follow along you’ll need to install Pandas and Jupyter:

pip install pandas jupyter

Next, launch Jupyter and create a notebook:

jupyter notebook

Let’s start by importing Pandas:

import pandas as pd

And now we’ll create a DataFrame containing the data that we want to format:

df = pd.DataFrame({
    "LTLA Name": ["Amber Valley", "Ashfield", "Bassetlaw"],
    "Population": [72179, 77988, 70832],
    "PercentageVaccinated": [0.228571, 0.223342, 0.184493]
})
Table 1. Results
LTLA Name Population PercentageVaccinated

0

Amber Valley

72179

0.228571

1

Ashfield

77988

0.223342

2

Bassetlaw

70832

0.184493

One way to do this is to format the values in place, as shown below:

df.loc[:, "Population"] = df["Population"].map('{:,d}'.format)
df.loc[:, "PercentageVaccinated"] = df["PercentageVaccinated"].map('{:.2f}'.format)

After this transformation, the DataFrame looks like this:

Table 2. Results
LTLA Name Population PercentageVaccinated

0

Amber Valley

72,179

0.23

1

Ashfield

77,988

0.22

2

Bassetlaw

70,832

0.18

This works, but it changes the underlying values in the DataFrame to be objects, which we can see by calling the dtypes function:

df.dtypes
Output
LTLA Name               object
Population              object
PercentageVaccinated    object
dtype: object

This means that we can’t do any number based computations anymore. For example, if we try to multiple the Population and PercentageVaccinated columns:

df.Population * df.PercentageVaccinated
Output
Traceback (most recent call last):
  File "/home/markhneedham/.local/share/virtualenvs/covid-vaccines-xEbcGJTy/lib/python3.8/site-packages/pandas/core/ops/array_ops.py", line 142, in _na_arithmetic_op
    result = expressions.evaluate(op, left, right)
  File "/home/markhneedham/.local/share/virtualenvs/covid-vaccines-xEbcGJTy/lib/python3.8/site-packages/pandas/core/computation/expressions.py", line 235, in evaluate
    return _evaluate(op, op_str, a, b)  # type: ignore[misc]
  File "/home/markhneedham/.local/share/virtualenvs/covid-vaccines-xEbcGJTy/lib/python3.8/site-packages/pandas/core/computation/expressions.py", line 69, in _evaluate_standard
    return op(a, b)
TypeError: can't multiply sequence by non-int of type 'str'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/markhneedham/.local/share/virtualenvs/covid-vaccines-xEbcGJTy/lib/python3.8/site-packages/pandas/core/ops/common.py", line 65, in new_method
    return method(self, other)
  File "/home/markhneedham/.local/share/virtualenvs/covid-vaccines-xEbcGJTy/lib/python3.8/site-packages/pandas/core/arraylike.py", line 105, in __mul__
    return self._arith_method(other, operator.mul)
  File "/home/markhneedham/.local/share/virtualenvs/covid-vaccines-xEbcGJTy/lib/python3.8/site-packages/pandas/core/series.py", line 4998, in _arith_method
    result = ops.arithmetic_op(lvalues, rvalues, op)
  File "/home/markhneedham/.local/share/virtualenvs/covid-vaccines-xEbcGJTy/lib/python3.8/site-packages/pandas/core/ops/array_ops.py", line 189, in arithmetic_op
    res_values = _na_arithmetic_op(lvalues, rvalues, op)
  File "/home/markhneedham/.local/share/virtualenvs/covid-vaccines-xEbcGJTy/lib/python3.8/site-packages/pandas/core/ops/array_ops.py", line 149, in _na_arithmetic_op
    result = _masked_arith_op(left, right, op)
  File "/home/markhneedham/.local/share/virtualenvs/covid-vaccines-xEbcGJTy/lib/python3.8/site-packages/pandas/core/ops/array_ops.py", line 91, in _masked_arith_op
    result[mask] = op(xrav[mask], yrav[mask])
TypeError: can't multiply sequence by non-int of type 'str'

So if we want to use the underlying data for anything else, formatting like this isn’t a good solution.

Instead we can use Pandas styling functionality. The documentation is a bit overwhelming, but Chris Moffitt has a great introductory article on the styling API.

If we want to apply the same formatting to every column, we can pass a style to style.format. e.g. we could restrict every column to 2 decimal places, as shown below:

df.style.format("{.2f")
Note

For a description of valid format values, see the Format Specification Mini-Language documentation or Python String Format Cookbook.

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
~/.local/share/virtualenvs/covid-vaccines-xEbcGJTy/lib/python3.8/site-packages/IPython/core/formatters.py in __call__(self, obj)
    343             method = get_real_method(obj, self.print_method)
    344             if method is not None:
--> 345                 return method()
    346             return None
    347         else:

~/.local/share/virtualenvs/covid-vaccines-xEbcGJTy/lib/python3.8/site-packages/pandas/io/formats/style.py in _repr_html_(self)
    203         Hooks into Jupyter notebook rich display system.
    204         """
--> 205         return self.render()
    206
    207     @doc(

~/.local/share/virtualenvs/covid-vaccines-xEbcGJTy/lib/python3.8/site-packages/pandas/io/formats/style.py in render(self, **kwargs)
    619         self._compute()
    620         # TODO: namespace all the pandas keys
--> 621         d = self._translate()
    622         # filter out empty styles, every cell will have a class
    623         # but the list of props may just be [['', '']].

~/.local/share/virtualenvs/covid-vaccines-xEbcGJTy/lib/python3.8/site-packages/pandas/io/formats/style.py in _translate(self)
    403                     "value": value,
    404                     "class": " ".join(cs),
--> 405                     "display_value": formatter(value),
    406                     "is_visible": (c not in hidden_columns),
    407                 }

~/.local/share/virtualenvs/covid-vaccines-xEbcGJTy/lib/python3.8/site-packages/pandas/io/formats/style.py in <lambda>(x)
   1715 ) -> Callable:
   1716     if isinstance(formatter, str):
-> 1717         formatter_func = lambda x: formatter.format(x)
   1718     elif callable(formatter):
   1719         formatter_func = formatter

ValueError: Unknown format code 'f' for object of type 'str'

That doesn’t work because the LTLA Name column contains string values, which can’t be formatted as a number. We can work around that problem by dropping the LTLA Name column:

df.drop(["LTLA Name"], axis=1).style.format("{:.2f}")
Table 3. Results
Population PercentageVaccinated

0

72179.00

0.23

1

77988.00

0.22

2

70832.00

0.18

This works, but we’ve lost the LTLA Name column and the Population column isn’t formatted how we’d like. Instead of passing a single style to style.format, we can instead pass a dictionary of {"column: "style"}. So to style Population with a comma as thousands separator and PercentageVaccinated with two decimal places, we can do the following:

df.style.format({
    "Population": "{:,d}",
    "PercentageVaccinated": "{:.2f}"
})
Table 4. Results
LTLA Name Population PercentageVaccinated

0

Amber Valley

72,179

0.23

1

Ashfield

77,988

0.22

2

Bassetlaw

70,832

0.18

And if we go one step further, we can also use the hide_index function to get rid of the index column:

df.style.format({
    "Population": "{:,d}",
    "PercentageVaccinated": "{:.2f}"
}).hide_index()
Table 5. Results
LTLA Name Population PercentageVaccinated

Amber Valley

72,179

0.23

Ashfield

77,988

0.22

Bassetlaw

70,832

0.18

  • LinkedIn
  • Tumblr
  • Reddit
  • Google+
  • Pinterest
  • Pocket