When working with ecommerce and marketing data in time series analysis projects, the dates of national holidays, or bank holidays, can make a big difference to customer behaviour so are important model features.
National holidays don’t always fall on the same dates, and they differ from country to country, so it can be challenging to identify whether a given date fell on a national holiday or not.
In this project, I’ll show you how you can identify the dates of all national holidays that fall within a time period for any country in the world using Python, Pandas, and the Holidays Python package.
To get started, open a new Jupyter notebook and install the Holidays package and import Pandas using the import
pandas as pd
convention.
!pip3 install holidays
import pandas as pd
import holidays
Next, we’ll create a helper function that uses Holidays to fetch the dates of the national holidays for a given
country and that fall within a specified time period. We’ll first fetch the dates of the holidays for the country,
then we’ll use the Pandas date_range()
function to fetch the dates that lie within the period.
We’ll finally filter the list of dates and return only the ones that fall within our time period, which we’ll return as a list of Pandas datetime objects. Datetime objects are the standard date format in Python and can easily be converted to other Python date formats.
def get_national_holidays(start_date, end_date, country):
"""Return a list of the dates of national holidays between two dates
for a given country.
Args:
start_date (str): Start date of the period
end_date (str): End date of the period
country (str): Country to get the national holidays for
Returns:
list: List of national holiday dates
"""
# Get the Bank Holidays for the given country
holiday_days = holidays.CountryHoliday(country)
# Create a list of dates between the start and end date
date_range = pd.date_range(start_date, end_date)
# Filter the dates to only include Bank Holidays
national_holidays = [date for date in date_range if date in holiday_days]
return national_holidays
To fetch the dates of all national holidays (or bank holidays) for the UK we need to pass the GB
country code and
the start and end date for our period. For the US, we use the US
country code. We now have two lists of datetimes
representing every national holiday or bank holiday for each country.
national_holidays = get_national_holidays('2022-01-01', '2022-12-31', 'GB')
national_holidays
[Timestamp('2022-01-01 00:00:00', freq='D'),
Timestamp('2022-01-02 00:00:00', freq='D'),
Timestamp('2022-01-03 00:00:00', freq='D'),
Timestamp('2022-01-04 00:00:00', freq='D'),
Timestamp('2022-03-17 00:00:00', freq='D'),
Timestamp('2022-04-15 00:00:00', freq='D'),
Timestamp('2022-04-18 00:00:00', freq='D'),
Timestamp('2022-05-02 00:00:00', freq='D'),
Timestamp('2022-06-02 00:00:00', freq='D'),
Timestamp('2022-06-03 00:00:00', freq='D'),
Timestamp('2022-07-12 00:00:00', freq='D'),
Timestamp('2022-08-01 00:00:00', freq='D'),
Timestamp('2022-08-29 00:00:00', freq='D'),
Timestamp('2022-09-19 00:00:00', freq='D'),
Timestamp('2022-11-30 00:00:00', freq='D'),
Timestamp('2022-12-25 00:00:00', freq='D'),
Timestamp('2022-12-26 00:00:00', freq='D'),
Timestamp('2022-12-27 00:00:00', freq='D')]
national_holidays = get_national_holidays('2022-01-01', '2022-12-31', 'US')
national_holidays
[Timestamp('2022-01-01 00:00:00', freq='D'),
Timestamp('2022-01-17 00:00:00', freq='D'),
Timestamp('2022-02-21 00:00:00', freq='D'),
Timestamp('2022-05-30 00:00:00', freq='D'),
Timestamp('2022-06-19 00:00:00', freq='D'),
Timestamp('2022-06-20 00:00:00', freq='D'),
Timestamp('2022-07-04 00:00:00', freq='D'),
Timestamp('2022-09-05 00:00:00', freq='D'),
Timestamp('2022-10-10 00:00:00', freq='D'),
Timestamp('2022-11-11 00:00:00', freq='D'),
Timestamp('2022-11-24 00:00:00', freq='D'),
Timestamp('2022-12-25 00:00:00', freq='D'),
Timestamp('2022-12-26 00:00:00', freq='D')]
Matt Clarke, Wednesday, December 28, 2022