When working with numeric data in Pandas you’ll often need to round numbers to the nearest whole number, round them up, round them down, or round them to two decimal places. While Pandas includes the round()
function for basic rounding, you’ll need to use a little simple Numpy to do anything slightly more complex. In this quick and easy tutorial, I’ll show you how it’s done.
To get started, open a Jupyter notebook and import the Pandas library and the Numpy library using the as pd
and as np
naming conventions. We’ll need Numpy as well as Pandas for rounding, since you can’t do everything with standard Pandas functionality.
import pandas as pd
import numpy as np
Next, create a dataframe using the from_dict()
function. We’ll include some dummy float
values with lots of decimal places so we can round them in various ways.
data = {'sku': ['1', '2', '3', '4'],
'price': [28.345343, 34.99, 29.000000, 42.3937289]}
df = pd.DataFrame.from_dict(data)
df
sku | price | |
---|---|---|
0 | 1 | 28.345343 |
1 | 2 | 34.990000 |
2 | 3 | 29.000000 |
3 | 4 | 42.393729 |
If you have a float
value you want to round to the nearest whole number you can append the round()
method with no arguments. We’ll call df['price].round()
and assign the rounded value to a new column called price_round
. As you’ll see, this rounds the number but doesn’t change the dtype
of the data, so the value will still be a float
.
df['price_round'] = df['price'].round()
df
sku | price | price_round | |
---|---|---|---|
0 | 1 | 28.345343 | 28.0 |
1 | 2 | 34.990000 | 35.0 |
2 | 3 | 29.000000 | 29.0 |
3 | 4 | 42.393729 | 42.0 |
Chances are, if you’re rounding a float
value to the nearest whole number, you’ll most likely want an integer
value like 28, rather than a float
like 28.0. To solve this you can change the dtype of the float
to int
using the astype()
method.
df['price_round'] = df['price'].round().astype(int)
df
sku | price | price_round | |
---|---|---|---|
0 | 1 | 28.345343 | 28 |
1 | 2 | 34.990000 | 35 |
2 | 3 | 29.000000 | 29 |
3 | 4 | 42.393729 | 42 |
By passing in a value to the round()
method you can define the number of decimal places or trailing digits returned. For example, df['price'].round(2)
will round 28.345343 to 28.35.
df['price_round_2'] = df['price'].round(2)
df
sku | price | price_round | price_round_2 | |
---|---|---|---|---|
0 | 1 | 28.345343 | 28 | 28.35 |
1 | 2 | 34.990000 | 35 | 34.99 |
2 | 3 | 29.000000 | 29 | 29.00 |
3 | 4 | 42.393729 | 42 | 42.39 |
To round a number up in Pandas you need to use Numpy’s ceil()
method via the Pandas apply()
function. This will calculate the ceiling - or next highest number - of a given float
value, so 28.345343 will become 29 rather than the 28 you’d get if you just used round()
.
df['price_round_up'] = df['price'].apply(np.ceil)
df
sku | price | price_round | price_round_2 | price_round_up | |
---|---|---|---|---|---|
0 | 1 | 28.345343 | 28.0 | 28.35 | 29.0 |
1 | 2 | 34.990000 | 35.0 | 34.99 | 35.0 |
2 | 3 | 29.000000 | 29.0 | 29.00 | 29.0 |
3 | 4 | 42.393729 | 42.0 | 42.39 | 43.0 |
Similarly, to round a number down in Pandas you need to use the Numpy floor()
method along with the Pandas apply()
function. Given a value like 34.990000 this will round the number down to 34, rather than the 35 you’d get with round()
.
df['price_round_down'] = df['price'].apply(np.floor)
df
sku | price | price_round | price_round_2 | price_round_up | price_round_down | |
---|---|---|---|---|---|---|
0 | 1 | 28.345343 | 28.0 | 28.35 | 29.0 | 28.0 |
1 | 2 | 34.990000 | 35.0 | 34.99 | 35.0 | 34.0 |
2 | 3 | 29.000000 | 29.0 | 29.00 | 29.0 | 29.0 |
3 | 4 | 42.393729 | 42.0 | 42.39 | 43.0 | 42.0 |
Matt Clarke, Thursday, January 05, 2023