The Pandas shift()
function is used to shift the position of a dataframe or series by a specified number of periods. It’s commonly used for the creation of so-called lagged variables or lagged features in time series datasets.
In this quick tutorial, we’ll cover the basics of the shift()
function and show how you can create lagged variables from values in a Pandas dataframe.
To get started, open a Jupyter notebook and import the Pandas library. Then, either import your own dataset into Pandas, or create a dataframe of dummy data. We’ll use a dictionary to create a dataframe of dates and order volumes.
import pandas as pd
df = pd.DataFrame({'date': ['2023-01-01', '2023-01-02', '2023-01-03',
'2023-01-04', '2023-01-05', '2023-01-06',
'2023-01-07', '2023-01-08', '2023-01-09'],
'orders': [3432, 3233, 3434, 3235, 3436,
3237, 3438, 3239, 3440]})
df
date | orders | |
---|---|---|
0 | 2023-01-01 | 3432 |
1 | 2023-01-02 | 3233 |
2 | 2023-01-03 | 3434 |
3 | 2023-01-04 | 3235 |
4 | 2023-01-05 | 3436 |
5 | 2023-01-06 | 3237 |
6 | 2023-01-07 | 3438 |
7 | 2023-01-08 | 3239 |
8 | 2023-01-09 | 3440 |
To use the shift()
method you simply append the function to your dataframe column and pass in an integer argument to define how many rows you want to shift. To create a lag based on one day in our date column we’ll use shift(1)
, and to create a lag based on one week we’ll use shift(7)
, then we’ll use the Pandas assign()
method to create a new column, and we’ll save it back to the dataframe.
If you look at new columns you’ll see that the orders_lag1
column contains either a NaN
value when there was no previous row to shift, or the value of the row one row above. Similarly, for the orders_lag7
column, we have NaN
values for the first seven dates because there were insufficient prior values.
df = df.assign(
orders_lag1 = df['orders'].shift(1),
orders_lag7 = df['orders'].shift(7)
)
df
date | orders | orders_lag1 | orders_lag7 | |
---|---|---|---|---|
0 | 2023-01-01 | 3432 | NaN | NaN |
1 | 2023-01-02 | 3233 | 3432.0 | NaN |
2 | 2023-01-03 | 3434 | 3233.0 | NaN |
3 | 2023-01-04 | 3235 | 3434.0 | NaN |
4 | 2023-01-05 | 3436 | 3235.0 | NaN |
5 | 2023-01-06 | 3237 | 3436.0 | NaN |
6 | 2023-01-07 | 3438 | 3237.0 | NaN |
7 | 2023-01-08 | 3239 | 3438.0 | 3432.0 |
8 | 2023-01-09 | 3440 | 3239.0 | 3233.0 |
The other really neat thing you can do with shift()
is combine it with a groupby()
operation. To show this in action we’ll first create another dataframe containing various customer IDs, order IDs, order dates, and order amounts.
df = pd.DataFrame({'order_id': [100, 101, 102, 103, 104, 105],
'customer_id': [1, 2, 3, 1, 2, 2],
'order_date': ['2023-01-01', '2023-01-01', '2023-01-03', '2023-01-04', '2023-01-05', '2023-01-06'],
'order_amount': [129.23, 82.23, 343.23, 12.22, 53.22, 292.11]})
df
order_id | customer_id | order_date | order_amount | |
---|---|---|---|---|
0 | 100 | 1 | 2023-01-01 | 129.23 |
1 | 101 | 2 | 2023-01-01 | 82.23 |
2 | 102 | 3 | 2023-01-03 | 343.23 |
3 | 103 | 1 | 2023-01-04 | 12.22 |
4 | 104 | 2 | 2023-01-05 | 53.22 |
5 | 105 | 2 | 2023-01-06 | 292.11 |
Next, we’ll use method chaining and a lambda function to calculate the date of each customer’s previous order, then we’ll use the Pandas to_datetime()
method to calculate how many days have passed between the date of the current order and the date of the previous one, and we’ll return the value in days.
# Group by customer_id and return the date of the previous order
df = (df
.assign(
last_order_date = lambda x: x.groupby('customer_id')['order_date'].shift(1),
days_since_last_order = lambda x: (pd.to_datetime(x['order_date'])
- pd.to_datetime(x['last_order_date'])).dt.days
)
)
df
order_id | customer_id | order_date | order_amount | last_order_date | days_since_last_order | |
---|---|---|---|---|---|---|
0 | 100 | 1 | 2023-01-01 | 129.23 | NaN | NaN |
1 | 101 | 2 | 2023-01-01 | 82.23 | NaN | NaN |
2 | 102 | 3 | 2023-01-03 | 343.23 | NaN | NaN |
3 | 103 | 1 | 2023-01-04 | 12.22 | 2023-01-01 | 3.0 |
4 | 104 | 2 | 2023-01-05 | 53.22 | 2023-01-01 | 4.0 |
5 | 105 | 2 | 2023-01-06 | 292.11 | 2023-01-05 | 1.0 |
Matt Clarke, Monday, January 16, 2023