 Picture by Mikhail Nilov, Pexels.

When working with Pandas dataframes you’ll often need to calculate the percentage change or percentage difference between the values in two columns. There are various ways to do this in Pandas.

In this quick and easy tutorial, I’ll show you three different approaches you can use to calculate the percentage change between two columns, including the Pandas `pct_change()` function, `lambda` functions, and custom functions added using both `apply()` and `assign()`.

### Create a Pandas dataframe

To get started, open a Jupyter notebook, import the Pandas library using the `import pandas as pd` naming convention, and create a Pandas dataframe that contains two columns of numeric data to compare. The simple example dataset below the number of orders placed from each of five countries over two years.

``````import pandas as pd

'orders_2022': [23, 12, 34, 45, 67],
'orders_2023': [12, 34, 56, 78, 90]})
df
``````
country orders_2022 orders_2023
0 USA 23 12
2 Mexico 34 56
3 Brazil 45 78
4 UK 67 90

### Calculate the percentage change between columns with pct_change()

Pandas, rather helpfully, includes a built-in function called `pct_change()` that allows you to calculate the percentage change across rows or columns in a dataframe. By default, `pct_change()` sets the optional `axis` parameter to `0` which means that it will calculate the percentage change between one row and the next. However, by setting `axis=1` we can calculate the percentage change between columns instead.

``````df['pct_change'] = df[['orders_2022', 'orders_2023']].pct_change(axis=1)['orders_2023']
df
``````
country orders_2022 orders_2023 pct_change
0 USA 23 12 -0.478261
2 Mexico 34 56 0.647059
3 Brazil 45 78 0.733333
4 UK 67 90 0.343284

As you’ll notice above, `pct_change()` really returns a fractional change rather than a percentage change, so the -47.8% change in orders for the USA between 2022 and 2023 is shown as -0.478261 instead of -0.478261%. You need to multiply the value by 100 to get the actual percentage difference or change. You may also wish to use `round()` to round to two decimal places and cast the value to a `str` dtype and append a percentage symbol to aid readability.

``````df['pct_change_str'] = ((df[['orders_2022', 'orders_2023']].pct_change(axis=1)['orders_2023']) * 100).round(2).map(str) + '%'
df[['country', 'orders_2022', 'orders_2023', 'pct_change_str']]
``````
country orders_2022 orders_2023 pct_change_str
0 USA 23 12 -47.83%
2 Mexico 34 56 64.71%
3 Brazil 45 78 73.33%
4 UK 67 90 34.33%

### Calculate percentage difference with a lambda function

Another way to calculate percentage difference or percentage change between Pandas columns is via a lambda function. These anonymous functions are useful when you want to apply a function to a Pandas dataframe or series without having to define a function first. We can calculate the percentage difference and multiply it by 100 to get the percentage in a single line of code using the `apply()` method.

``````df['pct_change_lambda'] = df[['orders_2022', 'orders_2023']].apply(lambda x: ((x - x) / x) * 100, axis=1)
df[['country', 'orders_2022', 'orders_2023', 'pct_change_lambda']]
``````

### Calculate percentage difference using assign()

If you prefer to use the Pandas `assign()` method, you can do so as well. The `assign()` method also avoids the potential of getting the `SettingWithCopyWarning` error.

``````df = df.assign(pct_change_assign = lambda x: ((x['orders_2023'] - x['orders_2022']) / x['orders_2022']) * 100)
df[['country', 'orders_2022', 'orders_2023', 'pct_change_assign']]
``````
country orders_2022 orders_2023 pct_change_assign
0 USA 23 12 -47.826087
2 Mexico 34 56 64.705882
3 Brazil 45 78 73.333333
4 UK 67 90 34.328358

### Calculate percentage difference using a custom function

Finally, the other way to calculate the percentage difference between two columns is to create a custom function and apply it to the dataframe. While this means creating a custom function, it can result in cleaner code than using a `lambda` function, so it’s worth considering if you want to avoid using `pct_change()` so you’ve got total control over the output.

``````def custom_calc(x):
return (((x - x) / x) * 100).round(2)

df['custom_calc_apply'] = df[['orders_2022', 'orders_2023']].apply(custom_calc, axis=1)
df[['country', 'orders_2022', 'orders_2023', 'custom_calc_apply']]
``````
country orders_2022 orders_2023 custom_calc_apply
0 USA 23 12 -47.83