When working with customer data in ecommerce it’s very common for data scientists to need to add and subtract from date values directly within MySQL queries. For example, you might want to identify which of the customers who’ve placed an order recently went on to make another purchase in the next 12 months.
MySQL provides two useful functions for adding and subtracting date and time values from datetime values stored in MySQL -
DATE_SUB(). They both work in the same way.
You simply define the column you wish to add to or subtract from, then define the date or time interval you want to add or subtract. MySQL will then return a date with your desired number of years, months, weeks, seconds, or microseconds added or removed. Here’s a quick guide to using them.
If you want to run these queries yourself in Python, you’ll first need access to a MySQL server and a database to query. I run my MySQL server in a Docker container and have created a test database based on the widely used Online Retail Dataset.
I’ve created a couple of step-by-step guides that you can follow to create a MySQL Docker container and import a MySQL database into the Docker container, so you can follow along on the very same data. It only takes a few minutes.
Next, you’ll need to install the SQLAlchemy package that is used to query MySQL and other databases via Pandas, and the PyMySQL driver for connecting to a MySQL server from Python. You can install these using the Pip package manager, then import the required modules into a Jupyter notebook or Python script.
!pip3 install sqlalchemy !pip3 install pymysql
import pandas as pd from sqlalchemy import create_engine
Next, you’ll need to create a PyMySQL query string and connect to the MySQL server running on your Docker container, or wherever you host your MySQL database.
engine = create_engine('mysql+pymysql://root:*****@172.17.0.3:3306/ecommerce')
Once you’ve configured your database connection, you can create a simple SQL
SELECT statement to check that the connection is working correctly.
df = pd.read_sql("SELECT * FROM customers", con=engine) df.head()
|0||14646.0||279489.02||77||703||196719||2010-12-20 10:09:00||2011-12-08 12:12:00|
|1||18102.0||256438.49||62||151||64122||2010-12-07 16:42:00||2011-12-09 11:50:00|
|2||17450.0||187482.17||55||127||69029||2010-12-07 09:23:00||2011-12-01 13:29:00|
|3||14911.0||132572.62||248||1794||77180||2010-12-01 14:05:00||2011-12-08 15:54:00|
|4||12415.0||123725.45||26||444||77242||2011-01-06 11:12:00||2011-11-15 14:22:00|
To subtract days, weeks, months, quarters, hours, or seconds from a date in MySQL you can use the
DATE_SUB() function. This uses the format
DATE_SUB(date, INTERVAL value) where
value is one of the allowed date or time interval values to subtract and then returns the
datetime value after the date subtraction. The allowed date and time interval values you can pass to
Here’s an example of how to use the
DATE_SUB() function in MySQL. To get the date the day before a date, we can use
DATE_SUB(first_order_date, INTERVAL 1 DAY) AS day_before_first_order.
To get the same day last year, we can use
DATE_SUB(first_order_date, INTERVAL 1 YEAR) AS year_before_first_order. You can simply change the value passed after
INTERVAL to subtract any time or date value you wish, and then use
AS to assign the value to a new column in the MySQL result set.
query = """ SELECT customer_id, first_order_date, DATE_SUB(first_order_date, INTERVAL 1 DAY) AS day_before_first_order, DATE_SUB(first_order_date, INTERVAL 10 DAY) AS 10_days_before_first_order, DATE_SUB(first_order_date, INTERVAL 1 YEAR) AS year_before_first_order FROM customers """ df = pd.read_sql(query, con=engine) df.head()
|0||14646.0||2010-12-20 10:09:00||2010-12-19 10:09:00||2010-12-10 10:09:00||2009-12-20 10:09:00|
|1||18102.0||2010-12-07 16:42:00||2010-12-06 16:42:00||2010-11-27 16:42:00||2009-12-07 16:42:00|
|2||17450.0||2010-12-07 09:23:00||2010-12-06 09:23:00||2010-11-27 09:23:00||2009-12-07 09:23:00|
|3||14911.0||2010-12-01 14:05:00||2010-11-30 14:05:00||2010-11-21 14:05:00||2009-12-01 14:05:00|
|4||12415.0||2011-01-06 11:12:00||2011-01-05 11:12:00||2010-12-27 11:12:00||2010-01-06 11:12:00|
DATE_ADD() function of MySQL works in just the same way as
DATE_SUB() but adds to a datetime value. Like
DATE_ADD() also uses a similar syntax, so
DATE_ADD(date, INTERVAL value). The function then adds the desired number of seconds, minutes, hours, or years to the date provided. Here are a few examples:
query = """ SELECT customer_id, first_order_date, DATE_SUB(first_order_date, INTERVAL 1 DAY) AS day_after_first_order, DATE_SUB(first_order_date, INTERVAL 3 MONTH) AS three_months_after_first_order, DATE_SUB(first_order_date, INTERVAL 1 YEAR) AS year_after_first_order FROM customers """ df = pd.read_sql(query, con=engine) df.head()
|0||14646.0||2010-12-20 10:09:00||2010-12-19 10:09:00||2010-09-20 10:09:00||2009-12-20 10:09:00|
|1||18102.0||2010-12-07 16:42:00||2010-12-06 16:42:00||2010-09-07 16:42:00||2009-12-07 16:42:00|
|2||17450.0||2010-12-07 09:23:00||2010-12-06 09:23:00||2010-09-07 09:23:00||2009-12-07 09:23:00|
|3||14911.0||2010-12-01 14:05:00||2010-11-30 14:05:00||2010-09-01 14:05:00||2009-12-01 14:05:00|
|4||12415.0||2011-01-06 11:12:00||2011-01-05 11:12:00||2010-10-06 11:12:00||2010-01-06 11:12:00|
Matt Clarke, Thursday, April 21, 2022