Calculating the time difference between two dates in Pandas can yield useful information to aid your analysis, help you understand the data, and guide a machine learning model on making a more accurate prediction.
Python is superb for time series data analysis and it’s easy to handle dates of various formats and add and subtract days from dates in Pandas to create new features or perform more useful data analyses.
In this tutorial I’ll show how you can calculate the time difference between two dates in Pandas and return the difference in minutes, seconds, hours, or days. It’s quick and easy to do, and it’s a great way to get started with the data.
First, open a Jupyter notebook and import the pandas
library. Then use the Pandas read_csv()
function to load the data. I’ve created a CSV file containing some dates you can use for this project which is hosted on my GitHub page.
If you print the head()
you’ll see that the data comprise three columns: the order_id
, the order_date
, and the despatch_date
from an ecommerce website. We’ll be using Pandas to calculate how long it took the operations team to despatch each order placed.
import pandas as pd
df = pd.read_csv('https://raw.githubusercontent.com/flyandlure/datasets/master/dates.csv')
df.head()
order_id | order_date | despatch_date | |
---|---|---|---|
0 | 134763 | 2021-11-23 04:11:00 | 2021-11-23 13:11:00 |
1 | 134764 | 2021-11-23 04:11:00 | 2021-11-23 09:11:00 |
2 | 134765 | 2021-11-23 07:11:00 | 2021-11-24 01:11:00 |
3 | 134766 | 2021-11-23 08:11:00 | 2021-11-23 10:11:00 |
4 | 134767 | 2021-11-23 08:11:00 | 2021-11-23 10:11:00 |
Next, let’s check the format of the data. Run df.info()
to return the details on the columns and data types present in the dataframe. As with many such datasets, the dates are actually stored as object
data types or strings. We’ll need to convert them to datetime
objects before we can use them in our calculations.
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100 entries, 0 to 99
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 order_id 100 non-null int64
1 order_date 100 non-null object
2 despatch_date 100 non-null object
dtypes: int64(1), object(2)
memory usage: 2.5+ KB
There are several useful Pandas functions for reformatting dates. We’ll be using the
to_datetime()
function to convert the dates to datetime
objects. The errors='coerce'
argument with return NaT
(or “not a time”) for any date that cannot be converted. We’ll reassign the order_date
and despatch_date
columns to datetime
objects and save them to the original Pandas dataframe columns.
df['order_date'] = pd.to_datetime(df['order_date'], errors='coerce')
df['despatch_date'] = pd.to_datetime(df['despatch_date'], errors='coerce')
We can now check the data types of the columns by running df.info()
again. As you can see from the output, the order_date
and despatch_date
columns are now datetime
objects.
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100 entries, 0 to 99
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 order_id 100 non-null int64
1 order_date 100 non-null datetime64[ns]
2 despatch_date 100 non-null datetime64[ns]
dtypes: datetime64[ns](2), int64(1)
memory usage: 2.5 KB
There are several ways to calculate the time difference between two dates in Python using Pandas. The first is to subtract one date from the other. This returns a timedelta such as 0 days 05:00:00
that tells us the number of days, hours, minutes, and seconds between the two dates. This can be useful for simple visualisations of time differences, but you may need the days, hours, minutes, and seconds in a more granular format.
df['handling_time'] = df['despatch_date'] - df['order_date']
df.head()
order_id | order_date | despatch_date | handling_time | |
---|---|---|---|---|
0 | 134763 | 2021-11-23 04:11:00 | 2021-11-23 13:11:00 | 0 days 09:00:00 |
1 | 134764 | 2021-11-23 04:11:00 | 2021-11-23 09:11:00 | 0 days 05:00:00 |
2 | 134765 | 2021-11-23 07:11:00 | 2021-11-24 01:11:00 | 0 days 18:00:00 |
3 | 134766 | 2021-11-23 08:11:00 | 2021-11-23 10:11:00 | 0 days 02:00:00 |
4 | 134767 | 2021-11-23 08:11:00 | 2021-11-23 10:11:00 | 0 days 02:00:00 |
One way to obtain to calculate the difference between two dates with greater precision is to use the dt.total_seconds()
function. By wrapping our calculation in parentheses and then appending .total_seconds()
to the end of our calculation, we can obtain the number of seconds between the two dates and can assign it to a new column.
df['handling_time_seconds'] = (df['despatch_date'] - df['order_date']).dt.total_seconds()
df.head()
order_id | order_date | despatch_date | handling_time | handling_time_seconds | |
---|---|---|---|---|---|
0 | 134763 | 2021-11-23 04:11:00 | 2021-11-23 13:11:00 | 0 days 09:00:00 | 32400.0 |
1 | 134764 | 2021-11-23 04:11:00 | 2021-11-23 09:11:00 | 0 days 05:00:00 | 18000.0 |
2 | 134765 | 2021-11-23 07:11:00 | 2021-11-24 01:11:00 | 0 days 18:00:00 | 64800.0 |
3 | 134766 | 2021-11-23 08:11:00 | 2021-11-23 10:11:00 | 0 days 02:00:00 | 7200.0 |
4 | 134767 | 2021-11-23 08:11:00 | 2021-11-23 10:11:00 | 0 days 02:00:00 | 7200.0 |
Since there are 60 seconds in every minute, we can simply divide the number of seconds between the two dates by 60 to obtain the number of minutes between the two dates. Again, we can assign the result to a new column.
df['handling_time_minutes'] = (df['despatch_date'] - df['order_date']).dt.total_seconds() / 60
df.head()
order_id | order_date | despatch_date | handling_time | handling_time_seconds | handling_time_minutes | |
---|---|---|---|---|---|---|
0 | 134763 | 2021-11-23 04:11:00 | 2021-11-23 13:11:00 | 0 days 09:00:00 | 32400.0 | 540.0 |
1 | 134764 | 2021-11-23 04:11:00 | 2021-11-23 09:11:00 | 0 days 05:00:00 | 18000.0 | 300.0 |
2 | 134765 | 2021-11-23 07:11:00 | 2021-11-24 01:11:00 | 0 days 18:00:00 | 64800.0 | 1080.0 |
3 | 134766 | 2021-11-23 08:11:00 | 2021-11-23 10:11:00 | 0 days 02:00:00 | 7200.0 | 120.0 |
4 | 134767 | 2021-11-23 08:11:00 | 2021-11-23 10:11:00 | 0 days 02:00:00 | 7200.0 | 120.0 |
To calculate the time difference between the two dates in seconds we can divide the total_seconds()
value by 60 to obtain the minutes, then divide by 60 again to obtain the time difference in hours. We can then assign the time difference in hours to a new column in the dataframe.
df['handling_time_hours'] = (df['despatch_date'] - df['order_date']).dt.total_seconds() / 60 / 60
df.head()
order_id | order_date | despatch_date | handling_time | handling_time_seconds | handling_time_minutes | handling_time_hours | |
---|---|---|---|---|---|---|---|
0 | 134763 | 2021-11-23 04:11:00 | 2021-11-23 13:11:00 | 0 days 09:00:00 | 32400.0 | 540.0 | 9.0 |
1 | 134764 | 2021-11-23 04:11:00 | 2021-11-23 09:11:00 | 0 days 05:00:00 | 18000.0 | 300.0 | 5.0 |
2 | 134765 | 2021-11-23 07:11:00 | 2021-11-24 01:11:00 | 0 days 18:00:00 | 64800.0 | 1080.0 | 18.0 |
3 | 134766 | 2021-11-23 08:11:00 | 2021-11-23 10:11:00 | 0 days 02:00:00 | 7200.0 | 120.0 | 2.0 |
4 | 134767 | 2021-11-23 08:11:00 | 2021-11-23 10:11:00 | 0 days 02:00:00 | 7200.0 | 120.0 | 2.0 |
Finally, we’ll calculate the time difference between the two dates in days by modifying the step above to divide by 24 hours. That gives us the time difference in days. This is not the only way to achieve this in Pandas, but it’s one of the quickest and easiest to use, and it runs quickly, even on large time series datasets.
df['handling_time_days'] = (df['despatch_date'] - df['order_date']).dt.total_seconds() / 60 / 60 / 24
df['handling_time_full_days'] = (df['despatch_date'] - df['order_date']).dt.days
df.head()
order_id | order_date | despatch_date | handling_time | handling_time_seconds | handling_time_minutes | handling_time_hours | handling_time_full_days | handling_time_days | |
---|---|---|---|---|---|---|---|---|---|
0 | 134763 | 2021-11-23 04:11:00 | 2021-11-23 13:11:00 | 0 days 09:00:00 | 32400.0 | 540.0 | 9.0 | 0 | 0.375000 |
1 | 134764 | 2021-11-23 04:11:00 | 2021-11-23 09:11:00 | 0 days 05:00:00 | 18000.0 | 300.0 | 5.0 | 0 | 0.208333 |
2 | 134765 | 2021-11-23 07:11:00 | 2021-11-24 01:11:00 | 0 days 18:00:00 | 64800.0 | 1080.0 | 18.0 | 0 | 0.750000 |
3 | 134766 | 2021-11-23 08:11:00 | 2021-11-23 10:11:00 | 0 days 02:00:00 | 7200.0 | 120.0 | 2.0 | 0 | 0.083333 |
4 | 134767 | 2021-11-23 08:11:00 | 2021-11-23 10:11:00 | 0 days 02:00:00 | 7200.0 | 120.0 | 2.0 | 0 | 0.083333 |
Matt Clarke, Thursday, December 23, 2021