When working with customer data in ecommerce it’s very common for data scientists to need to add and subtract from date values directly within MySQL queries. For example, you might want to identify which of the customers who’ve placed an order recently went on to make another purchase in the next 12 months.
MySQL provides two useful functions for adding and subtracting date and time values from datetime values stored in MySQL - DATE_ADD()
and DATE_SUB()
. They both work in the same way.
You simply define the column you wish to add to or subtract from, then define the date or time interval you want to add or subtract. MySQL will then return a date with your desired number of years, months, weeks, seconds, or microseconds added or removed. Here’s a quick guide to using them.
If you want to run these queries yourself in Python, you’ll first need access to a MySQL server and a database to query. I run my MySQL server in a Docker container and have created a test database based on the widely used Online Retail Dataset.
I’ve created a couple of step-by-step guides that you can follow to create a MySQL Docker container and import a MySQL database into the Docker container, so you can follow along on the very same data. It only takes a few minutes.
Next, you’ll need to install the SQLAlchemy package that is used to query MySQL and other databases via Pandas, and the PyMySQL driver for connecting to a MySQL server from Python. You can install these using the Pip package manager, then import the required modules into a Jupyter notebook or Python script.
!pip3 install sqlalchemy
!pip3 install pymysql
import pandas as pd
from sqlalchemy import create_engine
Next, you’ll need to create a PyMySQL query string and connect to the MySQL server running on your Docker container, or wherever you host your MySQL database.
engine = create_engine('mysql+pymysql://root:*****@172.17.0.3:3306/ecommerce')
Once you’ve configured your database connection, you can create a simple SQL SELECT
statement to check that the connection is working correctly.
df = pd.read_sql("SELECT * FROM customers", con=engine)
df.head()
customer_id | revenue | orders | skus | units | first_order_date | last_order_date | |
---|---|---|---|---|---|---|---|
0 | 14646.0 | 279489.02 | 77 | 703 | 196719 | 2010-12-20 10:09:00 | 2011-12-08 12:12:00 |
1 | 18102.0 | 256438.49 | 62 | 151 | 64122 | 2010-12-07 16:42:00 | 2011-12-09 11:50:00 |
2 | 17450.0 | 187482.17 | 55 | 127 | 69029 | 2010-12-07 09:23:00 | 2011-12-01 13:29:00 |
3 | 14911.0 | 132572.62 | 248 | 1794 | 77180 | 2010-12-01 14:05:00 | 2011-12-08 15:54:00 |
4 | 12415.0 | 123725.45 | 26 | 444 | 77242 | 2011-01-06 11:12:00 | 2011-11-15 14:22:00 |
To subtract days, weeks, months, quarters, hours, or seconds from a date in MySQL you can use the DATE_SUB()
function. This uses the format DATE_SUB(date, INTERVAL value)
where value
is one of the allowed date or time interval values to subtract and then returns the datetime
value after the date subtraction. The allowed date and time interval values you can pass to DATE_SUB()
are:
YEAR
YEAR_MONTH
QUARTER
MONTH
WEEK
DAY
HOUR
MINUTE
SECOND
MICROSECOND
SECOND_MICROSECOND
MINUTE_MICROSECOND
HOUR_MICROSECOND
HOUR_SECOND
HOUR_MINUTE
DAY_HOUR
DAY_MINUTE
DAY_SECOND
DAY_MICROSECOND
Here’s an example of how to use the DATE_SUB()
function in MySQL. To get the date the day before a date, we can use DATE_SUB(first_order_date, INTERVAL 1 DAY) AS day_before_first_order
.
To get the same day last year, we can use DATE_SUB(first_order_date, INTERVAL 1 YEAR) AS year_before_first_order
. You can simply change the value passed after INTERVAL
to subtract any time or date value you wish, and then use AS
to assign the value to a new column in the MySQL result set.
query = """
SELECT
customer_id,
first_order_date,
DATE_SUB(first_order_date, INTERVAL 1 DAY) AS day_before_first_order,
DATE_SUB(first_order_date, INTERVAL 10 DAY) AS 10_days_before_first_order,
DATE_SUB(first_order_date, INTERVAL 1 YEAR) AS year_before_first_order
FROM customers
"""
df = pd.read_sql(query, con=engine)
df.head()
customer_id | first_order_date | day_before_first_order | 10_days_before_first_order | year_before_first_order | |
---|---|---|---|---|---|
0 | 14646.0 | 2010-12-20 10:09:00 | 2010-12-19 10:09:00 | 2010-12-10 10:09:00 | 2009-12-20 10:09:00 |
1 | 18102.0 | 2010-12-07 16:42:00 | 2010-12-06 16:42:00 | 2010-11-27 16:42:00 | 2009-12-07 16:42:00 |
2 | 17450.0 | 2010-12-07 09:23:00 | 2010-12-06 09:23:00 | 2010-11-27 09:23:00 | 2009-12-07 09:23:00 |
3 | 14911.0 | 2010-12-01 14:05:00 | 2010-11-30 14:05:00 | 2010-11-21 14:05:00 | 2009-12-01 14:05:00 |
4 | 12415.0 | 2011-01-06 11:12:00 | 2011-01-05 11:12:00 | 2010-12-27 11:12:00 | 2010-01-06 11:12:00 |
The DATE_ADD()
function of MySQL works in just the same way as DATE_SUB()
but adds to a datetime value. Like DATE_SUB()
, DATE_ADD()
also uses a similar syntax, so DATE_ADD(date, INTERVAL value)
. The function then adds the desired number of seconds, minutes, hours, or years to the date provided. Here are a few examples:
query = """
SELECT
customer_id,
first_order_date,
DATE_SUB(first_order_date, INTERVAL 1 DAY) AS day_after_first_order,
DATE_SUB(first_order_date, INTERVAL 3 MONTH) AS three_months_after_first_order,
DATE_SUB(first_order_date, INTERVAL 1 YEAR) AS year_after_first_order
FROM customers
"""
df = pd.read_sql(query, con=engine)
df.head()
customer_id | first_order_date | day_after_first_order | three_months_after_first_order | year_after_first_order | |
---|---|---|---|---|---|
0 | 14646.0 | 2010-12-20 10:09:00 | 2010-12-19 10:09:00 | 2010-09-20 10:09:00 | 2009-12-20 10:09:00 |
1 | 18102.0 | 2010-12-07 16:42:00 | 2010-12-06 16:42:00 | 2010-09-07 16:42:00 | 2009-12-07 16:42:00 |
2 | 17450.0 | 2010-12-07 09:23:00 | 2010-12-06 09:23:00 | 2010-09-07 09:23:00 | 2009-12-07 09:23:00 |
3 | 14911.0 | 2010-12-01 14:05:00 | 2010-11-30 14:05:00 | 2010-09-01 14:05:00 | 2009-12-01 14:05:00 |
4 | 12415.0 | 2011-01-06 11:12:00 | 2011-01-05 11:12:00 | 2010-10-06 11:12:00 | 2010-01-06 11:12:00 |
Matt Clarke, Thursday, April 21, 2022