How to calculate the difference and percentage change between rows in Pandas

Learn how to use diff() and pct_change() in Pandas to calculate the difference and percentage change between rows.

How to calculate the difference and percentage change between rows in Pandas
Picture by Karolina Grabowska, Pexels.
9 minutes to read

When working with Pandas dataframes, it’s a very common task to calculate the difference between two rows. For example, you might want to calculate the difference in the number of visitors to your website between two days, or the difference in the price of a stock between two days.

There are actually a number of different ways to calculate the difference between two rows in Pandas and calculate their percentage change. In this post, we’ll look at two of the most common methods: diff() and pct_change(), which are designed specifically for this task, and doing the same thing across column values.

Import the data

To get started, open a new Jupyter notebook and import the data. We’ll use the pandas library to read the data from a CSV file into a dataframe using the read_csv() function. Pandas supports importing data from a number of different file formats, including CSV, Excel, JSON, and SQL.

import pandas as pd
df = pd.read_csv('https://raw.githubusercontent.com/flyandlure/datasets/master/causal_impact_dataset.csv')
df.head()
date clicks impressions ctr position
0 2021-07-04 136 6301 2.16 31.97
1 2021-07-05 264 8697 3.04 27.66
2 2021-07-06 299 9236 3.24 26.38
3 2021-07-07 276 10008 2.76 26.77
4 2021-07-08 283 9725 2.91 25.83

Calculate the difference in between each row using diff()

Crucially, you need to ensure your Pandas dataframe has been sorted into a logical order before you calculate the differences between rows or their percentage change. You can do this by appending .sort_values(by='column_name_here') to the end of your dataframe, and passing in the column name you want to sort by.

To calculate the difference between selected values in each row of our dataframe we’ll simply append .diff() to the end of our column name and then assign the value to a new column in our dataframe. We don’t need to do it here, but the axis parameter can be used to calculate the difference between columns instead of rows, and the periods parameter can be used to calculate the difference between rows that are further apart than the next row by using shift().


```python
df['clicks_diff'] = df['clicks'].diff()
df['impressions_diff'] = df['impressions'].diff()
df['ctr_diff'] = df['ctr'].diff()
df['position_diff'] = df['position'].diff()
df.head(10)
date clicks impressions ctr position clicks_diff impressions_diff ctr_diff position_diff
0 2021-07-04 136 6301 2.16 31.97 NaN NaN NaN NaN
1 2021-07-05 264 8697 3.04 27.66 128.0 2396.0 0.88 -4.31
2 2021-07-06 299 9236 3.24 26.38 35.0 539.0 0.20 -1.28
3 2021-07-07 276 10008 2.76 26.77 -23.0 772.0 -0.48 0.39
4 2021-07-08 283 9725 2.91 25.83 7.0 -283.0 0.15 -0.94
5 2021-07-09 205 8273 2.48 27.65 -78.0 -1452.0 -0.43 1.82
6 2021-07-10 106 7186 1.48 33.74 -99.0 -1087.0 -1.00 6.09
7 2021-07-11 164 7431 2.21 31.75 58.0 245.0 0.73 -1.99
8 2021-07-12 316 11101 2.85 26.94 152.0 3670.0 0.64 -4.81
9 2021-07-13 314 11494 2.73 26.90 -2.0 393.0 -0.12 -0.04

Calculate the percentage change between each row with pct_change()

The same kind of approach can be used to calculate the percentage change between selected values in each row of our dataframe. As with diff(), we simply append .pct_change() to the end of the column name and then assign the value to a new column.

The pct_change() function will calculate the percentage change between each row and the previous row. This means that the first row will always be NaN as there is no previous row to compare it to. As with diff(), the pct_change() function has some other arguments that can be used to change the behaviour of the function.

For example, we can use the periods argument to specify the number of rows to compare to. This is useful if we want to compare the current row to a row that is not the previous row. For example, if we wanted to compare the current row to the row that was 3 rows ago, we could use periods=3.

df['clicks_pct_change'] = df['clicks'].pct_change()
df['impressions_pct_change'] = df['impressions'].pct_change()
df['ctr_pct_change'] = df['ctr'].pct_change()
df['position_pct_change'] = df['position'].pct_change()
df.head(10)
date clicks impressions ctr position clicks_diff impressions_diff ctr_diff position_diff clicks_pct_change impressions_pct_change ctr_pct_change position_pct_change
0 2021-07-04 136 6301 2.16 31.97 NaN NaN NaN NaN NaN NaN NaN NaN
1 2021-07-05 264 8697 3.04 27.66 128.0 2396.0 0.88 -4.31 0.941176 0.380257 0.407407 -0.134814
2 2021-07-06 299 9236 3.24 26.38 35.0 539.0 0.20 -1.28 0.132576 0.061975 0.065789 -0.046276
3 2021-07-07 276 10008 2.76 26.77 -23.0 772.0 -0.48 0.39 -0.076923 0.083586 -0.148148 0.014784
4 2021-07-08 283 9725 2.91 25.83 7.0 -283.0 0.15 -0.94 0.025362 -0.028277 0.054348 -0.035114
5 2021-07-09 205 8273 2.48 27.65 -78.0 -1452.0 -0.43 1.82 -0.275618 -0.149306 -0.147766 0.070461
6 2021-07-10 106 7186 1.48 33.74 -99.0 -1087.0 -1.00 6.09 -0.482927 -0.131391 -0.403226 0.220253
7 2021-07-11 164 7431 2.21 31.75 58.0 245.0 0.73 -1.99 0.547170 0.034094 0.493243 -0.058980
8 2021-07-12 316 11101 2.85 26.94 152.0 3670.0 0.64 -4.81 0.926829 0.493877 0.289593 -0.151496
9 2021-07-13 314 11494 2.73 26.90 -2.0 393.0 -0.12 -0.04 -0.006329 0.035402 -0.042105 -0.001485

Calculate the percentage change versus the same day last week

To calculate the percentage change in a metric versus the same day last week we can pass in a value to the periods argument of the pct_change() function. This will calculate the percentage change in the metric versus the same day last week.

# Calculate the percentage change between each row and the previous week
df['clicks_pct_change_week'] = df['clicks'].pct_change(periods=7)
df['impressions_pct_change_week'] = df['impressions'].pct_change(periods=7)
df['ctr_pct_change_week'] = df['ctr'].pct_change(periods=7)
df['position_pct_change_week'] = df['position'].pct_change(periods=7)

# Show the original data and the weekly percentage changes
df[['clicks', 'clicks_pct_change_week', 'impressions', 'impressions_pct_change_week', 'ctr', 'ctr_pct_change_week', 'position', 'position_pct_change_week']].tail(10)
clicks clicks_pct_change_week impressions impressions_pct_change_week ctr ctr_pct_change_week position position_pct_change_week
18 424 0.452055 15988 0.367080 2.65 0.060000 26.65 -0.026662
19 357 0.469136 14590 0.343586 2.45 0.093750 26.92 -0.094517
20 256 0.651613 11987 0.548508 2.14 0.070000 30.34 -0.081997
21 257 0.502924 12227 0.365078 2.10 0.099476 29.55 -0.054097
22 471 0.539216 15934 0.232900 2.96 0.248945 24.28 -0.104390
23 462 0.350877 15516 0.191339 2.98 0.133080 24.55 -0.051757
24 418 0.208092 15590 0.104029 2.68 0.093878 24.90 -0.058245
25 477 0.125000 16842 0.053415 2.83 0.067925 23.83 -0.105816
26 340 -0.047619 13867 -0.049554 2.45 0.000000 26.18 -0.027489
27 194 -0.242188 9749 -0.186702 1.99 -0.070093 31.42 0.035597

Matt Clarke, Saturday, September 10, 2022

Matt Clarke Matt is an Ecommerce and Marketing Director who uses data science to help in his work. Matt has a Master's degree in Internet Retailing (plus two other Master's degrees in different fields) and specialises in the technical side of ecommerce and marketing.