How to calculate percentage change between columns in Pandas

Learn how to calculate the percentage change or percentage difference between two columns in a Pandas dataframe using the pct_change() function and various other techniques.

How to calculate percentage change between columns in Pandas
Picture by Mikhail Nilov, Pexels.
8 minutes to read

When working with Pandas dataframes you’ll often need to calculate the percentage change or percentage difference between the values in two columns. There are various ways to do this in Pandas.

In this quick and easy tutorial, I’ll show you three different approaches you can use to calculate the percentage change between two columns, including the Pandas pct_change() function, lambda functions, and custom functions added using both apply() and assign().

Create a Pandas dataframe

To get started, open a Jupyter notebook, import the Pandas library using the import pandas as pd naming convention, and create a Pandas dataframe that contains two columns of numeric data to compare. The simple example dataset below the number of orders placed from each of five countries over two years.

import pandas as pd

df = pd.DataFrame({'country':['USA','Canada','Mexico','Brazil','UK'],
                   'orders_2022': [23, 12, 34, 45, 67],
                   'orders_2023': [12, 34, 56, 78, 90]})
country orders_2022 orders_2023
0 USA 23 12
1 Canada 12 34
2 Mexico 34 56
3 Brazil 45 78
4 UK 67 90

Calculate the percentage change between columns with pct_change()

Pandas, rather helpfully, includes a built-in function called pct_change() that allows you to calculate the percentage change across rows or columns in a dataframe. By default, pct_change() sets the optional axis parameter to 0 which means that it will calculate the percentage change between one row and the next. However, by setting axis=1 we can calculate the percentage change between columns instead.

df['pct_change'] = df[['orders_2022', 'orders_2023']].pct_change(axis=1)['orders_2023']
country orders_2022 orders_2023 pct_change
0 USA 23 12 -0.478261
1 Canada 12 34 1.833333
2 Mexico 34 56 0.647059
3 Brazil 45 78 0.733333
4 UK 67 90 0.343284

As you’ll notice above, pct_change() really returns a fractional change rather than a percentage change, so the -47.8% change in orders for the USA between 2022 and 2023 is shown as -0.478261 instead of -0.478261%. You need to multiply the value by 100 to get the actual percentage difference or change. You may also wish to use round() to round to two decimal places and cast the value to a str dtype and append a percentage symbol to aid readability.

df['pct_change_str'] = ((df[['orders_2022', 'orders_2023']].pct_change(axis=1)['orders_2023']) * 100).round(2).map(str) + '%'
df[['country', 'orders_2022', 'orders_2023', 'pct_change_str']]
country orders_2022 orders_2023 pct_change_str
0 USA 23 12 -47.83%
1 Canada 12 34 183.33%
2 Mexico 34 56 64.71%
3 Brazil 45 78 73.33%
4 UK 67 90 34.33%

Calculate percentage difference with a lambda function

Another way to calculate percentage difference or percentage change between Pandas columns is via a lambda function. These anonymous functions are useful when you want to apply a function to a Pandas dataframe or series without having to define a function first. We can calculate the percentage difference and multiply it by 100 to get the percentage in a single line of code using the apply() method.

df['pct_change_lambda'] = df[['orders_2022', 'orders_2023']].apply(lambda x: ((x[1] - x[0]) / x[0]) * 100, axis=1)
df[['country', 'orders_2022', 'orders_2023', 'pct_change_lambda']]

Calculate percentage difference using assign()

If you prefer to use the Pandas assign() method, you can do so as well. The assign() method also avoids the potential of getting the SettingWithCopyWarning error.

df = df.assign(pct_change_assign = lambda x: ((x['orders_2023'] - x['orders_2022']) / x['orders_2022']) * 100)
df[['country', 'orders_2022', 'orders_2023', 'pct_change_assign']]
country orders_2022 orders_2023 pct_change_assign
0 USA 23 12 -47.826087
1 Canada 12 34 183.333333
2 Mexico 34 56 64.705882
3 Brazil 45 78 73.333333
4 UK 67 90 34.328358

Calculate percentage difference using a custom function

Finally, the other way to calculate the percentage difference between two columns is to create a custom function and apply it to the dataframe. While this means creating a custom function, it can result in cleaner code than using a lambda function, so it’s worth considering if you want to avoid using pct_change() so you’ve got total control over the output.

def custom_calc(x):
    return (((x[1] - x[0]) / x[0]) * 100).round(2)

df['custom_calc_apply'] = df[['orders_2022', 'orders_2023']].apply(custom_calc, axis=1)
df[['country', 'orders_2022', 'orders_2023', 'custom_calc_apply']]
country orders_2022 orders_2023 custom_calc_apply
0 USA 23 12 -47.83
1 Canada 12 34 183.33
2 Mexico 34 56 64.71
3 Brazil 45 78 73.33
4 UK 67 90 34.33

Matt Clarke, Sunday, January 01, 2023

Matt Clarke Matt is an Ecommerce and Marketing Director who uses data science to help in his work. Matt has a Master's degree in Internet Retailing (plus two other Master's degrees in different fields) and specialises in the technical side of ecommerce and marketing.