How to calculate percentage change between columns in Pandas

Picture by Mikhail Nilov, Pexels.

8 minutes to read

Data Science Pandas

When working with Pandas dataframes you’ll often need to calculate the percentage change or percentage difference between the values in two columns. There are various ways to do this in Pandas.

In this quick and easy tutorial, I’ll show you three different approaches you can use to calculate the percentage change between two columns, including the Pandas pct_change() function, lambda functions, and custom functions added using both apply() and assign().

Create a Pandas dataframe

To get started, open a Jupyter notebook, import the Pandas library using the import pandas as pd naming convention, and create a Pandas dataframe that contains two columns of numeric data to compare. The simple example dataset below the number of orders placed from each of five countries over two years.

import pandas as pd

df = pd.DataFrame({'country':['USA','Canada','Mexico','Brazil','UK'],
                   'orders_2022': [23, 12, 34, 45, 67],
                   'orders_2023': [12, 34, 56, 78, 90]})
df

	country	orders_2022	orders_2023
0	USA	23	12
1	Canada	12	34
2	Mexico	34	56
3	Brazil	45	78
4	UK	67	90

Calculate the percentage change between columns with pct_change()

Pandas, rather helpfully, includes a built-in function called pct_change() that allows you to calculate the percentage change across rows or columns in a dataframe. By default, pct_change() sets the optional axis parameter to 0 which means that it will calculate the percentage change between one row and the next. However, by setting axis=1 we can calculate the percentage change between columns instead.

df['pct_change'] = df[['orders_2022', 'orders_2023']].pct_change(axis=1)['orders_2023']
df

	country	orders_2022	orders_2023	pct_change
0	USA	23	12	-0.478261
1	Canada	12	34	1.833333
2	Mexico	34	56	0.647059
3	Brazil	45	78	0.733333
4	UK	67	90	0.343284

As you’ll notice above, pct_change() really returns a fractional change rather than a percentage change, so the -47.8% change in orders for the USA between 2022 and 2023 is shown as -0.478261 instead of -0.478261%. You need to multiply the value by 100 to get the actual percentage difference or change. You may also wish to use round() to round to two decimal places and cast the value to a str dtype and append a percentage symbol to aid readability.

df['pct_change_str'] = ((df[['orders_2022', 'orders_2023']].pct_change(axis=1)['orders_2023']) * 100).round(2).map(str) + '%'
df[['country', 'orders_2022', 'orders_2023', 'pct_change_str']]

	country	orders_2022	orders_2023	pct_change_str
0	USA	23	12	-47.83%
1	Canada	12	34	183.33%
2	Mexico	34	56	64.71%
3	Brazil	45	78	73.33%
4	UK	67	90	34.33%

Calculate percentage difference with a lambda function

Another way to calculate percentage difference or percentage change between Pandas columns is via a lambda function. These anonymous functions are useful when you want to apply a function to a Pandas dataframe or series without having to define a function first. We can calculate the percentage difference and multiply it by 100 to get the percentage in a single line of code using the apply() method.

df['pct_change_lambda'] = df[['orders_2022', 'orders_2023']].apply(lambda x: ((x[1] - x[0]) / x[0]) * 100, axis=1)
df[['country', 'orders_2022', 'orders_2023', 'pct_change_lambda']]

Calculate percentage difference using assign()

If you prefer to use the Pandas assign() method, you can do so as well. The assign() method also avoids the potential of getting the SettingWithCopyWarning error.

df = df.assign(pct_change_assign = lambda x: ((x['orders_2023'] - x['orders_2022']) / x['orders_2022']) * 100)
df[['country', 'orders_2022', 'orders_2023', 'pct_change_assign']]

	country	orders_2022	orders_2023	pct_change_assign
0	USA	23	12	-47.826087
1	Canada	12	34	183.333333
2	Mexico	34	56	64.705882
3	Brazil	45	78	73.333333
4	UK	67	90	34.328358

Calculate percentage difference using a custom function

Finally, the other way to calculate the percentage difference between two columns is to create a custom function and apply it to the dataframe. While this means creating a custom function, it can result in cleaner code than using a lambda function, so it’s worth considering if you want to avoid using pct_change() so you’ve got total control over the output.

def custom_calc(x):
    return (((x[1] - x[0]) / x[0]) * 100).round(2)

df['custom_calc_apply'] = df[['orders_2022', 'orders_2023']].apply(custom_calc, axis=1)
df[['country', 'orders_2022', 'orders_2023', 'custom_calc_apply']]

	country	orders_2022	orders_2023	custom_calc_apply
0	USA	23	12	-47.83
1	Canada	12	34	183.33
2	Mexico	34	56	64.71
3	Brazil	45	78	73.33
4	UK	67	90	34.33

Matt Clarke, Sunday, January 01, 2023

Matt Clarke Matt is an Ecommerce and Marketing Director who uses data science to help in his work. Matt has a Master's degree in Internet Retailing (plus two other Master's degrees in different fields) and specialises in the technical side of ecommerce and marketing.