How to use DATEDIFF() to calculate date differences in MySQL

Learn how to calculate the number of days between two dates in MySQL using the DATEDIFF() function. It's one of the most useful MySQL date functions for data scientists working with customer data or time series projects.

How to use DATEDIFF() to calculate date differences in MySQL
Picture by Pixabay, Pexels.
5 minutes to read

When working with customer data or upon time series data science projects, you’ll often find the need to calculate the difference between two dates in your MySQL queries. The MySQL DATEDIFF() function makes this process very simple and can be used for both dates stored in columns, or dates you’ve created yourself.

In this simple SQL tutorial I’ll show how you can use DATEDIFF() to calculate the difference between two dates and work out how long it’s been since a customer’s first and last orders using some real ecommerce data.

Set up your MySQL server

If you want to run these queries yourself in Python, you’ll first need access to a MySQL server and a database to query. I run my MySQL server in a Docker container and have created a test database based on the widely used Online Retail Dataset.

I’ve created a couple of step-by-step guides that you can follow to create a MySQL Docker container and import a MySQL database into the Docker container, so you can follow along on the very same data. It only takes a few minutes.

Install and load the packages

Next, you’ll need to install the SQLAlchemy package that is used to query MySQL and other databases via Pandas, and the PyMySQL driver for connecting to a MySQL server from Python. You can install these using the Pip package manager, then import the required modules into a Jupyter notebook or Python script.

!pip3 install sqlalchemy
!pip3 install pymysql
import pandas as pd
from sqlalchemy import create_engine

Select the dates of each customer’s first and last order

query = """
SELECT * FROM customers

df = pd.read_sql(query, con=engine)
customer_id revenue orders skus units first_order_date last_order_date
0 14646.0 279489.02 77 703 196719 2010-12-20 10:09:00 2011-12-08 12:12:00
1 18102.0 256438.49 62 151 64122 2010-12-07 16:42:00 2011-12-09 11:50:00
2 17450.0 187482.17 55 127 69029 2010-12-07 09:23:00 2011-12-01 13:29:00
3 14911.0 132572.62 248 1794 77180 2010-12-01 14:05:00 2011-12-08 15:54:00
4 12415.0 123725.45 26 444 77242 2011-01-06 11:12:00 2011-11-15 14:22:00

Use DATEDIFF() to calculate the difference between two dates

The MySQL DATEDIFF() function takes two dates and returns the number of days between them. The first date is the later and the second is the earlier date (flipping them the other way around will return a negative value). So, for example, DATEDIFF(last_order_date, first_order_date) will return the number of days between a customer’s last order and their first order.

You can use DATEDIFF() in two main ways. You can either use two column values, as shown in the example above, or you can pass in the current date or another date you’ve created that is not currently stored in a MySQL column value.

For example, suppose we want to calculate how many days it’s been since each customer placed their first order. We can do this by passing in the NOW() function as the first argument, which returns the current MySQL datetime, and then pass first_order_date as the second argument like this: DATEDIFF(NOW(), first_order_date). You can do the same with last_order_date to find the number of days that have elapsed since a customer last placed an order.

query = """
    DATEDIFF(last_order_date, first_order_date) AS days_between_orders, 
    DATEDIFF(NOW(), first_order_date) AS days_since_first_order,
    DATEDIFF(NOW(), last_order_date) AS days_since_last_order
FROM customers

df = pd.read_sql(query, con=engine)
customer_id first_order_date last_order_date days_between_orders days_since_first_order days_since_last_order
0 14646.0 2010-12-20 10:09:00 2011-12-08 12:12:00 353 4145 3792
1 18102.0 2010-12-07 16:42:00 2011-12-09 11:50:00 367 4158 3791
2 17450.0 2010-12-07 09:23:00 2011-12-01 13:29:00 359 4158 3799
3 14911.0 2010-12-01 14:05:00 2011-12-08 15:54:00 372 4164 3792
4 12415.0 2011-01-06 11:12:00 2011-11-15 14:22:00 313 4128 3815

Matt Clarke, Friday, April 22, 2022

Matt Clarke Matt is an Ecommerce and Marketing Director who uses data science to help in his work. Matt has a Master's degree in Internet Retailing (plus two other Master's degrees in different fields) and specialises in the technical side of ecommerce and marketing.