How to calculate the time difference between two dates in Pandas

Learn how to calculate the time difference between two dates in Pandas and return the difference in minutes, seconds, hours, or days.

How to calculate the time difference between two dates in Pandas
Picture by Olya Kobruseva, Pexels.
11 minutes to read

Calculating the time difference between two dates in Pandas can yield useful information to aid your analysis, help you understand the data, and guide a machine learning model on making a more accurate prediction.

Python is superb for time series data analysis and it’s easy to handle dates of various formats and add and subtract days from dates in Pandas to create new features or perform more useful data analyses.

In this tutorial I’ll show how you can calculate the time difference between two dates in Pandas and return the difference in minutes, seconds, hours, or days. It’s quick and easy to do, and it’s a great way to get started with the data.

Load the data

First, open a Jupyter notebook and import the pandas library. Then use the Pandas read_csv() function to load the data. I’ve created a CSV file containing some dates you can use for this project which is hosted on my GitHub page.

If you print the head() you’ll see that the data comprise three columns: the order_id, the order_date, and the despatch_date from an ecommerce website. We’ll be using Pandas to calculate how long it took the operations team to despatch each order placed.

import pandas as pd
df = pd.read_csv('https://raw.githubusercontent.com/flyandlure/datasets/master/dates.csv')
df.head()
order_id order_date despatch_date
0 134763 2021-11-23 04:11:00 2021-11-23 13:11:00
1 134764 2021-11-23 04:11:00 2021-11-23 09:11:00
2 134765 2021-11-23 07:11:00 2021-11-24 01:11:00
3 134766 2021-11-23 08:11:00 2021-11-23 10:11:00
4 134767 2021-11-23 08:11:00 2021-11-23 10:11:00

Check the format of the data

Next, let’s check the format of the data. Run df.info() to return the details on the columns and data types present in the dataframe. As with many such datasets, the dates are actually stored as object data types or strings. We’ll need to convert them to datetime objects before we can use them in our calculations.

df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100 entries, 0 to 99
Data columns (total 3 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   order_id       100 non-null    int64 
 1   order_date     100 non-null    object
 2   despatch_date  100 non-null    object
dtypes: int64(1), object(2)
memory usage: 2.5+ KB

Convert the dates to datetime format

There are several useful Pandas functions for reformatting dates. We’ll be using the to_datetime() function to convert the dates to datetime objects. The errors='coerce' argument with return NaT (or “not a time”) for any date that cannot be converted. We’ll reassign the order_date and despatch_date columns to datetime objects and save them to the original Pandas dataframe columns.

df['order_date'] = pd.to_datetime(df['order_date'], errors='coerce')
df['despatch_date'] = pd.to_datetime(df['despatch_date'], errors='coerce')

We can now check the data types of the columns by running df.info() again. As you can see from the output, the order_date and despatch_date columns are now datetime objects.

df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100 entries, 0 to 99
Data columns (total 3 columns):
 #   Column         Non-Null Count  Dtype         
---  ------         --------------  -----         
 0   order_id       100 non-null    int64         
 1   order_date     100 non-null    datetime64[ns]
 2   despatch_date  100 non-null    datetime64[ns]
dtypes: datetime64[ns](2), int64(1)
memory usage: 2.5 KB

Calculate the difference between two dates as a timedelta

There are several ways to calculate the time difference between two dates in Python using Pandas. The first is to subtract one date from the other. This returns a timedelta such as 0 days 05:00:00 that tells us the number of days, hours, minutes, and seconds between the two dates. This can be useful for simple visualisations of time differences, but you may need the days, hours, minutes, and seconds in a more granular format.

df['handling_time'] = df['despatch_date'] - df['order_date']
df.head()
order_id order_date despatch_date handling_time
0 134763 2021-11-23 04:11:00 2021-11-23 13:11:00 0 days 09:00:00
1 134764 2021-11-23 04:11:00 2021-11-23 09:11:00 0 days 05:00:00
2 134765 2021-11-23 07:11:00 2021-11-24 01:11:00 0 days 18:00:00
3 134766 2021-11-23 08:11:00 2021-11-23 10:11:00 0 days 02:00:00
4 134767 2021-11-23 08:11:00 2021-11-23 10:11:00 0 days 02:00:00

Calculate the difference between two dates in seconds

One way to obtain to calculate the difference between two dates with greater precision is to use the dt.total_seconds() function. By wrapping our calculation in parentheses and then appending .total_seconds() to the end of our calculation, we can obtain the number of seconds between the two dates and can assign it to a new column.

df['handling_time_seconds'] = (df['despatch_date'] - df['order_date']).dt.total_seconds()
df.head()
order_id order_date despatch_date handling_time handling_time_seconds
0 134763 2021-11-23 04:11:00 2021-11-23 13:11:00 0 days 09:00:00 32400.0
1 134764 2021-11-23 04:11:00 2021-11-23 09:11:00 0 days 05:00:00 18000.0
2 134765 2021-11-23 07:11:00 2021-11-24 01:11:00 0 days 18:00:00 64800.0
3 134766 2021-11-23 08:11:00 2021-11-23 10:11:00 0 days 02:00:00 7200.0
4 134767 2021-11-23 08:11:00 2021-11-23 10:11:00 0 days 02:00:00 7200.0

Calculate the difference between two dates in minutes

Since there are 60 seconds in every minute, we can simply divide the number of seconds between the two dates by 60 to obtain the number of minutes between the two dates. Again, we can assign the result to a new column.

df['handling_time_minutes'] = (df['despatch_date'] - df['order_date']).dt.total_seconds() / 60
df.head()
order_id order_date despatch_date handling_time handling_time_seconds handling_time_minutes
0 134763 2021-11-23 04:11:00 2021-11-23 13:11:00 0 days 09:00:00 32400.0 540.0
1 134764 2021-11-23 04:11:00 2021-11-23 09:11:00 0 days 05:00:00 18000.0 300.0
2 134765 2021-11-23 07:11:00 2021-11-24 01:11:00 0 days 18:00:00 64800.0 1080.0
3 134766 2021-11-23 08:11:00 2021-11-23 10:11:00 0 days 02:00:00 7200.0 120.0
4 134767 2021-11-23 08:11:00 2021-11-23 10:11:00 0 days 02:00:00 7200.0 120.0

Calculate the difference between two dates in hours

To calculate the time difference between the two dates in seconds we can divide the total_seconds() value by 60 to obtain the minutes, then divide by 60 again to obtain the time difference in hours. We can then assign the time difference in hours to a new column in the dataframe.

df['handling_time_hours'] = (df['despatch_date'] - df['order_date']).dt.total_seconds() / 60 / 60
df.head()
order_id order_date despatch_date handling_time handling_time_seconds handling_time_minutes handling_time_hours
0 134763 2021-11-23 04:11:00 2021-11-23 13:11:00 0 days 09:00:00 32400.0 540.0 9.0
1 134764 2021-11-23 04:11:00 2021-11-23 09:11:00 0 days 05:00:00 18000.0 300.0 5.0
2 134765 2021-11-23 07:11:00 2021-11-24 01:11:00 0 days 18:00:00 64800.0 1080.0 18.0
3 134766 2021-11-23 08:11:00 2021-11-23 10:11:00 0 days 02:00:00 7200.0 120.0 2.0
4 134767 2021-11-23 08:11:00 2021-11-23 10:11:00 0 days 02:00:00 7200.0 120.0 2.0

Calculate the difference between two dates in days

Finally, we’ll calculate the time difference between the two dates in days by modifying the step above to divide by 24 hours. That gives us the time difference in days. This is not the only way to achieve this in Pandas, but it’s one of the quickest and easiest to use, and it runs quickly, even on large time series datasets.

df['handling_time_days'] = (df['despatch_date'] - df['order_date']).dt.total_seconds() / 60 / 60 / 24
df['handling_time_full_days'] = (df['despatch_date'] - df['order_date']).dt.days
df.head()
order_id order_date despatch_date handling_time handling_time_seconds handling_time_minutes handling_time_hours handling_time_full_days handling_time_days
0 134763 2021-11-23 04:11:00 2021-11-23 13:11:00 0 days 09:00:00 32400.0 540.0 9.0 0 0.375000
1 134764 2021-11-23 04:11:00 2021-11-23 09:11:00 0 days 05:00:00 18000.0 300.0 5.0 0 0.208333
2 134765 2021-11-23 07:11:00 2021-11-24 01:11:00 0 days 18:00:00 64800.0 1080.0 18.0 0 0.750000
3 134766 2021-11-23 08:11:00 2021-11-23 10:11:00 0 days 02:00:00 7200.0 120.0 2.0 0 0.083333
4 134767 2021-11-23 08:11:00 2021-11-23 10:11:00 0 days 02:00:00 7200.0 120.0 2.0 0 0.083333

Matt Clarke, Thursday, December 23, 2021

Matt Clarke Matt is a Digital Director who uses data science to help in his work. He has a Master's degree in Internet Retailing (plus two other Master's degrees in different fields) and specialises in the technical side of ecommerce and marketing.

Joining Data with pandas

Learn to combine data from multiple tables by joining data together using pandas.

Start course for FREE

Comments