Most SQL databases store dates in the datetime
format. This is really useful because MySQL, and other SQL dialects, make it very easy to convert datetime
objects to a wide range of other date and time formats.
If you’re a data scientist working with MySQL data, you may wish to just stick to the datetime
format and reformat the dates in Pandas, however, it’s very easy to reformat dates in MySQL too. Here’s a quick guide to getting started.
I run my MySQL server in a Docker container and have created a test database based on the widely used Online Retail Dataset. I’ve created a couple of step-by-step guides that you can follow to create your own MySQL Docker container and import a MySQL database so you can follow along on the very same data. It only takes a few minutes.
Once you’ve got your MySQL server setup, open up a Jupyter notebook and install the SQLAlchemy and PyMySQL Python packages. We’ll be using these to connect to MySQL and pass in database queries using Pandas. Once you’ve installed the packages using the Pip package manager, import pandas
and the create_engine
module from sqlalchemy
.
!pip3 install sqlalchemy
!pip3 install pymysql
import pandas as pd
from sqlalchemy import create_engine
Next, we need to configure the connection to our MySQL server by creating a connection string to pass to the create_engine()
function. This will vary according to your own setup. I’m connecting to a MySQL server with the PyMySQL driver, using the username root and the password *****
on port 3306 of the IP address 172.17.0.3. The final argument tells create_engine()
to connect to the ecommerce
database.
engine = create_engine('mysql+pymysql://root:*****@172.17.0.2:3306/ecommerce')
To test the connection to the MySQL server we’ll first write a simple MySQL SELECT
statement to return everything from the order_items
table.
df = pd.read_sql_query("SELECT * FROM order_items", con=engine)
df.head()
invoice | sku | description | qty | order_date | unit_price | customer_id | country | line_price | |
---|---|---|---|---|---|---|---|---|---|
0 | 536365 | 85123A | WHITE HANGING HEART T-LIGHT HOLDER | 6 | 2010-12-01 08:26:00 | 2.55 | 17850.0 | United Kingdom | 15.30 |
1 | 536365 | 71053 | WHITE METAL LANTERN | 6 | 2010-12-01 08:26:00 | 3.39 | 17850.0 | United Kingdom | 20.34 |
2 | 536365 | 84406B | CREAM CUPID HEARTS COAT HANGER | 8 | 2010-12-01 08:26:00 | 2.75 | 17850.0 | United Kingdom | 22.00 |
3 | 536365 | 84029G | KNITTED UNION FLAG HOT WATER BOTTLE | 6 | 2010-12-01 08:26:00 | 3.39 | 17850.0 | United Kingdom | 20.34 |
4 | 536365 | 84029E | RED WOOLLY HOTTIE WHITE HEART. | 6 | 2010-12-01 08:26:00 | 3.39 | 17850.0 | United Kingdom | 20.34 |
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 541909 entries, 0 to 541908
Data columns (total 9 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 invoice 541909 non-null object
1 sku 541909 non-null object
2 description 540455 non-null object
3 qty 541909 non-null int64
4 order_date 541909 non-null datetime64[ns]
5 unit_price 541909 non-null float64
6 customer_id 406829 non-null float64
7 country 541909 non-null object
8 line_price 541909 non-null float64
dtypes: datetime64[ns](1), float64(3), int64(1), object(4)
memory usage: 37.2+ MB
The DATE_FORMAT()
function makes it very easy to reformat a datetime
column in MySQL into another format, including a complex string representation of a date, such as “Wednesday, April 20th, 2022”. The DATE_FORMAT()
function takes two arguments - the MySQL column name containing the datetime
you wish to reformat and a date modifier argument, such as %Y
, which will convert a datetime
to a four-digit year. Here’s a quick example.
query = """
SELECT
invoice,
order_date,
DATE_FORMAT(order_date, '%Y') AS year
FROM order_items
"""
df = pd.read_sql_query(query, con=engine)
df.head()
invoice | order_date | year | |
---|---|---|---|
0 | 536365 | 2010-12-01 08:26:00 | 2010 |
1 | 536365 | 2010-12-01 08:26:00 | 2010 |
2 | 536365 | 2010-12-01 08:26:00 | 2010 |
3 | 536365 | 2010-12-01 08:26:00 | 2010 |
4 | 536365 | 2010-12-01 08:26:00 | 2010 |
All you need to do to reformat a datetime
to another date format is pass in one or more MySQL date formatter or “specifier” arguments, and MySQL will parse the date and reformat it as desired. There’s a huge selection of date formatters or specifiers that can be used either on their own or together to create a whole myriad of different date representations.
Specifier | Description |
%a | Abbreviated weekday name (Sun..Sat) |
%b | Abbreviated month name (Jan..Dec) |
%c | Month, numeric (0..12) |
%D | Day of the month with English suffix (0th, 1st, 2nd, 3rd, …) |
%d | Day of the month, numeric (00..31) |
%e | Day of the month, numeric (0..31) |
%f | Microseconds (000000..999999) |
%H | Hour (00..23) |
%h | Hour (01..12) |
%I | Hour (01..12) |
%i | Minutes, numeric (00..59) |
%j | Day of year (001..366) |
%k | Hour (0..23) |
%l | Hour (1..12) |
%M | Month name (January..December) |
%m | Month, numeric (00..12) |
%p | AM or PM |
%r | Time, 12-hour (hh:mm:ss followed by AM or PM) |
%S | Seconds (00..59) |
%s | Seconds (00..59) |
%T | Time, 24-hour (hh:mm:ss) |
%U | Week (00..53), where Sunday is the first day of the week; WEEK() mode 0 |
%u | Week (00..53), where Monday is the first day of the week; WEEK() mode 1 |
%V | Week (01..53), where Sunday is the first day of the week; WEEK() mode 2; used with %X |
%v | Week (01..53), where Monday is the first day of the week; WEEK() mode 3; used with %x |
%W | Weekday name (Sunday..Saturday) |
%w | Day of the week (0=Sunday..6=Saturday) |
%X | Year for the week where Sunday is the first day of the week, numeric, four digits; used with %V |
%x | Year for the week, where Monday is the first day of the week, numeric, four digits; used with %v |
%Y | Year, numeric, four digits |
%y | Year, numeric (two digits) |
%% | A literal % character |
%x | x, for any "x" not listed above |
One common problem is that you may want to reformat a datetime string to a MySQL date string, which are stored in the YYYY-MM-DD
format, with no time value at the end. To do this you need to select the correct three date specifiers from the table above and then separate the values with hyphens, so %Y-%m-%d
will return 2010-12-01
from a datetime
value of 2010-12-01 08:26:00
.
query = """
SELECT
invoice,
order_date,
DATE_FORMAT(order_date, '%Y-%m-%d') AS mysql_date
FROM order_items
"""
df = pd.read_sql_query(query, con=engine)
df.head()
invoice | order_date | mysql_date | |
---|---|---|---|
0 | 536365 | 2010-12-01 08:26:00 | 2010-12-01 |
1 | 536365 | 2010-12-01 08:26:00 | 2010-12-01 |
2 | 536365 | 2010-12-01 08:26:00 | 2010-12-01 |
3 | 536365 | 2010-12-01 08:26:00 | 2010-12-01 |
4 | 536365 | 2010-12-01 08:26:00 | 2010-12-01 |
There are various date formatters you can apply to DATE_FORMAT()
to return the day from a date in MySQL. The %W
specifier will return the day name, i.e. Wednesday; the %a
specifier will return the abbreviated day name, i.e. Wed; the %w
specifier will return the day of the month with a leading zero, i.e. 01; the %d
specifier will return the day of the month as a numeric value without a leading zero, i.e. 10, while the %j
specifier will return the day of the year, i.e. 335.
query = """
SELECT
invoice,
order_date,
DATE_FORMAT(order_date, '%W') AS day_name,
DATE_FORMAT(order_date, '%a') AS abbreviated_day_name,
DATE_FORMAT(order_date, '%w') AS day_of_week,
DATE_FORMAT(order_date, '%d') AS day_of_month,
DATE_FORMAT(order_date, '%e') AS day_of_month_numeric,
DATE_FORMAT(order_date, '%j') AS day_of_year
FROM order_items
"""
df = pd.read_sql_query(query, con=engine)
df.head()
invoice | order_date | day_name | abbreviated_day_name | day_of_week | day_of_month | day_of_month_numeric | day_of_year | |
---|---|---|---|---|---|---|---|---|
0 | 536365 | 2010-12-01 08:26:00 | Wednesday | Wed | 3 | 01 | 1 | 335 |
1 | 536365 | 2010-12-01 08:26:00 | Wednesday | Wed | 3 | 01 | 1 | 335 |
2 | 536365 | 2010-12-01 08:26:00 | Wednesday | Wed | 3 | 01 | 1 | 335 |
3 | 536365 | 2010-12-01 08:26:00 | Wednesday | Wed | 3 | 01 | 1 | 335 |
4 | 536365 | 2010-12-01 08:26:00 | Wednesday | Wed | 3 | 01 | 1 | 335 |
There are three DATE_FORMAT()
specifiers you can use to convert a date into month name or month number representations. The %M
specifier will return the month name from a date, i.e. December; the %b
specifier will return the abbreviated month name from a date, i.e. Dec for December, while the %m
specifier will return the month number from the date, i.e. 12 for December.
query = """
SELECT
invoice,
order_date,
DATE_FORMAT(order_date, '%M') AS month_name,
DATE_FORMAT(order_date, '%b') AS abbreviated_month_name,
DATE_FORMAT(order_date, '%m') AS month_number
FROM order_items
"""
df = pd.read_sql_query(query, con=engine)
df.head()
invoice | order_date | month_name | abbreviated_month_name | month_number | |
---|---|---|---|---|---|
0 | 536365 | 2010-12-01 08:26:00 | December | Dec | 12 |
1 | 536365 | 2010-12-01 08:26:00 | December | Dec | 12 |
2 | 536365 | 2010-12-01 08:26:00 | December | Dec | 12 |
3 | 536365 | 2010-12-01 08:26:00 | December | Dec | 12 |
4 | 536365 | 2010-12-01 08:26:00 | December | Dec | 12 |
As we saw above with the YYYY-MM-DD
MySQL date example, you can also mix your own strings to the date specifiers to create more complex date formats. Here’s an example of a written version of a date and time that converts a standard datetime
to a more verbose written format.
query = """
SELECT
invoice,
order_date,
DATE_FORMAT(order_date, '%W, %M %D, %Y, at %r') AS mysql_date
FROM order_items
"""
df = pd.read_sql_query(query, con=engine)
df.sample(5)
invoice | order_date | mysql_date | |
---|---|---|---|
325810 | 565456 | 2011-09-04 16:12:00 | Sunday, September 4th, 2011, at 04:12:00 PM |
346190 | 567183 | 2011-09-18 15:32:00 | Sunday, September 18th, 2011, at 03:32:00 PM |
406368 | 571747 | 2011-10-19 10:59:00 | Wednesday, October 19th, 2011, at 10:59:00 AM |
28669 | 538652 | 2010-12-13 15:12:00 | Monday, December 13th, 2010, at 03:12:00 PM |
488729 | 577850 | 2011-11-22 11:04:00 | Tuesday, November 22nd, 2011, at 11:04:00 AM |
Matt Clarke, Tuesday, April 19, 2022