If you regularly work with time series data in Pandas it’s probable that you’ll sometimes need to convert dates or datetimes and extract additional features from them.
This could be something as simple as extracting the month name from a date to make your data visualisations clearer, or you could be calculating the day of the week to see if there’s a correlation with your model’s target variable.
Whatever you’re doing with dates and datetimes, Pandas makes most things relatively simple. The built in Series.dt
class lets you access the values of a series in your dataframe and can return various date and time properties, such as the weekday, or quarter.
However, there are some things you can’t do in Pandas without resorting to additional bits of Python code, as the Series.dt
datetime properties don’t support every date or time feature you’re likely to need. Here’s a detailed guide showing how you can extract or calculate these features from dates, datetimes, and strings containing dates to help you with your projects.
To get started, open a new Jupyter notebook, import the pandas
package, and create a simple dataframe containing a few dates. I’ve deliberately used a non-standard format for the dates here. The 2021/04/29
dates shown are strings, not datetimes, but we’ll convert them in our next step.
import pandas as pd
df = pd.DataFrame({
'event': ['April Pay Day', 'Mother\'s Day', 'May Pay Day'],
'event_date': ['2021/04/29', '2021/04/29', '2021/05/09']
})
df.head()
event | event_date | |
---|---|---|
0 | April Pay Day | 2021/04/29 |
1 | Mother's Day | 2021/04/29 |
2 | May Pay Day | 2021/05/09 |
As you can see from the output of the info()
function, both columns in our dataframe are detected with the object
dtype, so our first step is going to be to convert these string values to datetimes, as all date manipulations in Pandas and Python require this common format.
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 event 3 non-null object
1 event_date 3 non-null object
dtypes: object(2)
memory usage: 176.0+ bytes
To change a string or object
data type to a datetime or datetime64[ns]
data type we can use the Pandas to_datetime()
function. We’ll create a new column in our dataframe called date
and assignt the output of to_datetime()
to this. The second argument, format='%Y/%m/%d'
, tells Pandas that our dates are in the YYYY/MM/DD
format, so they get correctly reformatted.
df['date'] = pd.to_datetime(df['event_date'], format='%Y/%m/%d')
df.head()
event | event_date | date | |
---|---|---|---|
0 | April Pay Day | 2021/04/29 | 2021-04-29 |
1 | Mother's Day | 2021/04/29 | 2021-04-29 |
2 | May Pay Day | 2021/05/09 | 2021-05-09 |
To confirm that we’ve successfully completed our first step of converting our date string to a datetime
, we can re-run info()
and check that the Dtype
column of the new date
column is set to datetime64[ns]
. Now this is in the right format, we can reformat it to a wide range of different date features.
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 event 3 non-null object
1 event_date 3 non-null object
2 date 3 non-null datetime64[ns]
dtypes: datetime64[ns](1), object(2)
memory usage: 200.0+ bytes
There are two main ways to reformat dates and extract features from them in Pandas. You can use the Pandas Series.dt
class, or you can use Python’s strftime()
function. We’ll start with the Series.dt
method. To use this method we’ll access the date
column, append the dt
method to it and assign the value to a new column.
These should be self-explanatory, but df['date'].dt.dayofweek
returns the day of the week, df['date'].dt.dayofyear
returns the day of the year, df['date'].dt.weekday
returns the weekday, and df['date'].dt.isocalendar().day
returns the weekday using an ISO week format. The other day features, such as the day name, need to be accessed using Python’s strftime()
function, which we’ll tackle in a moment.
df = df[['date']]
df['dayofweek'] = df['date'].dt.dayofweek
df['dayofyear'] = df['date'].dt.dayofyear
df['weekday'] = df['date'].dt.weekday
df['iso_weekday'] = df['date'].dt.isocalendar().day
df.head()
date | dayofweek | dayofyear | weekday | iso_weekday | |
---|---|---|---|---|---|
0 | 2021-04-29 | 3 | 119 | 3 | 4 |
1 | 2021-04-29 | 3 | 119 | 3 | 4 |
2 | 2021-05-09 | 6 | 129 | 6 | 7 |
Next, we’ll calculate the week
, month
, year
, and quarter
for each date in our dataframe. These all use dt
, apart from week
. This used to use dt.week
, but this is now deprecated, and you should instead use dt.isocalendar().week
.
df = df[['date']]
df['iso_week'] = df['date'].dt.isocalendar().week
df['year'] = df['date'].dt.year
df['month'] = df['date'].dt.month
df['quarter'] = df['date'].dt.quarter
df.head()
date | iso_week | year | month | quarter | |
---|---|---|---|---|---|
0 | 2021-04-29 | 17 | 2021 | 4 | 2 |
1 | 2021-04-29 | 17 | 2021 | 4 | 2 |
2 | 2021-05-09 | 18 | 2021 | 5 | 2 |
For ISO dates, there are a number of new functions that can be used to extract the week
, year
, and day
. These are formatted differently to the regular Series.dt
methods and need to use isocalendar()
.
df = df[['date']]
df['iso_year'] = df['date'].dt.isocalendar().year
df['iso_week'] = df['date'].dt.isocalendar().week
df['iso_weekday'] = df['date'].dt.isocalendar().day
df.head()
date | iso_year | iso_week | iso_weekday | |
---|---|---|---|---|
0 | 2021-04-29 | 2021 | 17 | 4 |
1 | 2021-04-29 | 2021 | 17 | 4 |
2 | 2021-05-09 | 2021 | 18 | 7 |
Since Pandas can’t do everything, we’ll need another solution to calculate other date features and reformat dates in different ways. The strftime
(short for “string from time”) function, lets you create custom date features from an existing datetime
column in your dataframe.
df = df[['date']]
df['month_year'] = df['date'].dt.strftime('%b-%Y')
df.head()
date | month_year | |
---|---|---|
0 | 2021-04-29 | Apr-2021 |
1 | 2021-04-29 | Apr-2021 |
2 | 2021-05-09 | May-2021 |
Python uses the 1989 C standard implementation for dates, which are common in most programming languages. These use a directive, such as %Y
, to tell the language how a date element should be reformatted. The most commonly used date directives are below, but you can see the full list of these in the Python documentation.
Date directive | Description | Example output |
---|---|---|
%a |
Weekday with locale abbreviation | Sat, Sun, Mon |
%A |
Full name of weekday for locale | Sunday |
%w |
Decimal weekday (Sun = 0, Sat = 6) | 0, 1, 2 |
%d |
Day of month with zero-padding | 01, 02, 03 |
%j |
Day of year with zero-padding | 001, 123, 365 |
df = df[['date']]
df['weekday_abbreviated'] = df['date'].dt.strftime('%a')
df['weekday_full'] = df['date'].dt.strftime('%A')
df['weekday_decimal'] = df['date'].dt.strftime('%w')
df['day_of_month'] = df['date'].dt.strftime('%d')
df['day_of_year'] = df['date'].dt.strftime('%j')
df.head()
date | weekday_abbreviated | weekday_full | weekday_decimal | day_of_month | day_of_year | |
---|---|---|---|---|---|---|
0 | 2021-04-29 | Thu | Thursday | 4 | 29 | 119 |
1 | 2021-04-29 | Thu | Thursday | 4 | 29 | 119 |
2 | 2021-05-09 | Sun | Sunday | 0 | 09 | 129 |
One thing to look out for with week numbers is that all days in a new year preceding the first Sunday (%U
) or Monday (%W
) are assigned to week 0
, which can cause confusion if you’re expecting them to be in week 1
.
Date directive | Description | Example output |
---|---|---|
%W |
Week (starting Mon) with zero-padding | 00, 02 |
%U |
Week (starting Sun) with zero-padding | 01, 52 |
%V |
ISO 8601 week (starting Mon) with zero-padding | 01, 52 |
df = df[['date']]
df['week_starting_monday'] = df['date'].dt.strftime('%W')
df['week_starting_sunday'] = df['date'].dt.strftime('%U')
df['iso_week_starting_monday'] = df['date'].dt.strftime('%V')
df.head()
date | week_starting_monday | week_starting_sunday | iso_week_starting_monday | |
---|---|---|---|---|
0 | 2021-04-29 | 17 | 17 | 17 |
1 | 2021-04-29 | 17 | 17 | 17 |
2 | 2021-05-09 | 18 | 19 | 18 |
Date directive | Description | Example output |
---|---|---|
%b |
Month name with locale abbreviation | Jan, Feb |
%B |
Full name of month for locale | January |
%m |
Month with zero-padding | 01, 02, 03 |
df = df[['date']]
df['month_name_abbreviated'] = df['date'].dt.strftime('%b')
df['month_name_full'] = df['date'].dt.strftime('%B')
df['month_number'] = df['date'].dt.strftime('%m')
df.head()
date | month_name_abbreviated | month_name_full | month_number | |
---|---|---|---|---|
0 | 2021-04-29 | Apr | April | 04 |
1 | 2021-04-29 | Apr | April | 04 |
2 | 2021-05-09 | May | May | 05 |
Date directive | Description | Example output |
---|---|---|
%Y |
Four digit year with century | 2021, 2050 |
%y |
Two digit year with zero-padding | 01, 20 |
df = df[['date']]
df['year_four_digit'] = df['date'].dt.strftime('%Y')
df['year_two_digit'] = df['date'].dt.strftime('%y')
df.head()
date | year_four_digit | year_two_digit | |
---|---|---|---|
0 | 2021-04-29 | 2021 | 21 |
1 | 2021-04-29 | 2021 | 21 |
2 | 2021-05-09 | 2021 | 21 |
You can also create custom date strings and date formats by adding a combination of date directives as arguments to strftime()
. For example, df['date'].dt.strftime('%A, %d %B, %Y')
will return a date in the following format Thursday, 29 April, 2021
.
df = df[['date']]
df['example_1'] = df['date'].dt.strftime('%A, %d %B, %Y')
df['example_2'] = df['date'].dt.strftime('%B %d, %Y')
df['example_3'] = df['date'].dt.strftime('%b %d, %y')
df.head()
date | example_1 | example_2 | example_3 | |
---|---|---|---|---|
0 | 2021-04-29 | Thursday, 29 April, 2021 | April 29, 2021 | Apr 29, 21 |
1 | 2021-04-29 | Thursday, 29 April, 2021 | April 29, 2021 | Apr 29, 21 |
2 | 2021-05-09 | Sunday, 09 May, 2021 | May 09, 2021 | May 09, 21 |
The Google Analytics yearMonth
API dimension returns a concatenated year-month value in the format 202110 or %Y%m
. To convert this to something more readable, such as October, 2021
we can run two conversions like this.
df['date'] = pd.to_datetime(df['yearMonth'], format='%Y%m')
df['period'] = df['date'].dt.strftime('%B, %Y')
If your dataframe contains dates with a time appended via the datetime
format, you can remove the time part and return the date without the time by using dt.date
. This is handy if you need to group your data by date, as it removes granularity.
df['order_date'] = df['order_date'].dt.date
The to_datetime()
function can be used to convert a timestamp integer, i.e. 1569324548, to a datetime by passing the unit='s'
argument.
df['datetime'] = pd.to_datetime(df['timestamp'], unit='s')
Matt Clarke, Saturday, March 06, 2021