Pandas includes some incredible features for working with dates and times. The Pandas date_range()
function is used to create a range of dates and can be used to create a list of dates in a range, or a dataframe of dates within a period. The date_range()
function takes the following parameters and returns a fixed frequency DatetimeIndex
, from which a Pandas dataframe of dates, or list of dates, can be generated.
pandas.date_range(start=None,
end=None,
periods=None,
freq=None,
tz=None,
normalize=False,
name=None,
closed=NoDefault.no_default,
inclusive=None,
**kwargs)
Parameter | Description |
---|---|
start |
The start parameter is an optional string or datetime-like value that represents the left-bound or start date when generating a list of dates in a date range. |
end |
The end parameter is an optional string or datetime-like value that represents the right-bound or end date when generating a list of dates in a date range. |
periods |
The periods parameter is an optional integer that represents the number of periods to generate in a date range. |
freq |
The freq parameter is an optional string or Pandas offset object that represents the frequency of the date range. This defaults to D for days, but you can also use multiples, so 12H would generate datetimes every 12 hours. It's quite powerful. |
tz |
The tz parameter is an optional string or timezone object that represents the timezone of the date range. |
normalize |
The normalize parameter is an optional boolean that represents whether or not to normalize the start and end datetimes in the date range to midnight. |
name |
The name parameter is an optional string that represents the name of the date range. |
closed |
The closed parameter is an optional string that represents whether the date range is left-inclusive or right-inclusive. This defaults to None, but you can also use left or right. This was deprecated in Pandas 1.4.0. |
inclusive |
The inclusive parameter is an optional value that can be set to both, neither, left, or right (and defaults to both). It's used to define whether to include boundaries or not. |
kwargs |
The kwargs parameter is an optional dictionary of keyword arguments that are passed to the underlying date range function. |
The Pandas date_range()
function has loads of uses in data science projects, especially time series data analysis, time series forecasting, and projects involving financial data. It can be used to create arrays of dates, lists of dates, and Pandas dataframes. In this project, we’ll go over some examples of how you can use date_range()
to perform various data science tasks.
To get started, open a new Jupyter notebook and import the Pandas package using the import pandas as pd
convention.
import pandas as pd
To get started, we’ll first use date_range()
to create a DatetimeIndex
containing all the dates in a date range between two dates, identified by the start
and end
parameters. By default, this sets the freq
argument to D
for days, but you can also change this to return a range of other date and time types.
pd.date_range(start='2022-01-01', end='2022-01-31')
DatetimeIndex(['2022-01-01', '2022-01-02', '2022-01-03', '2022-01-04',
'2022-01-05', '2022-01-06', '2022-01-07', '2022-01-08',
'2022-01-09', '2022-01-10', '2022-01-11', '2022-01-12',
'2022-01-13', '2022-01-14', '2022-01-15', '2022-01-16',
'2022-01-17', '2022-01-18', '2022-01-19', '2022-01-20',
'2022-01-21', '2022-01-22', '2022-01-23', '2022-01-24',
'2022-01-25', '2022-01-26', '2022-01-27', '2022-01-28',
'2022-01-29', '2022-01-30', '2022-01-31'],
dtype='datetime64[ns]', freq='D')
By default, date_range()
returns a DatetimeIndex
of dates, however, in many cases you’ll actually want a Pandas dataframe of dates to be returned. To return a Pandas dataframe instead of a DatetimeIndex
you can pass the original pd.date_range(start='2022-01-01', end='2022-01-31')
code to pd.DataFrame()
to create a new dataframe containing the dates. If you pass columns=['date']
you can also define the column name in which the date range is stored.
df = pd.DataFrame(pd.date_range(start='2022-01-01', end='2022-01-31'), columns=['date'])
df.head(3)
date | |
---|---|
0 | 2022-01-01 |
1 | 2022-01-02 |
2 | 2022-01-03 |
df.tail(3)
date | |
---|---|
28 | 2022-01-29 |
29 | 2022-01-30 |
30 | 2022-01-31 |
To get date_range()
to return a Python list of dates instead, we can append the Pandas tolist()
function to the pd.date_range(start='2022-01-01', end='2022-01-31')
code. We’ll assign the output to a variable called dates_list
and print the first three values using dates_list[:3]
.
dates_list = pd.date_range(start='2022-01-01', end='2022-01-31').tolist()
dates_list[:3]
[Timestamp('2022-01-01 00:00:00', freq='D'),
Timestamp('2022-01-02 00:00:00', freq='D'),
Timestamp('2022-01-03 00:00:00', freq='D')]
Now that we’ve covered how you can export date ranges as a DatetimeIndex
, a dataframe, or a list, we’ll look at some of the other date_range()
parameters you can use to control the export of date ranges created by the function.
By defining the freq
parameter we can specify the frequency of the date range. The freq
argument takes a Pandas frequency string. There are loads of these and you can use them to calculate pretty much every kind of date. The default is D
for daily, and this will return a date for each day within the date range defined. We can also use other values, such as W
for weekly, M
for monthly, Q
for quarterly, and A
for annual. Let’s create a DatetimeIndex that contains the start date for each week within a date range.
pd.date_range(start='2022-01-01', end='2022-01-31', freq='W')
DatetimeIndex(['2022-01-02', '2022-01-09', '2022-01-16', '2022-01-23',
'2022-01-30'],
dtype='datetime64[ns]', freq='W-SUN')
pd.date_range(start='2022-01-01', end='2022-12-31', freq='MS')
DatetimeIndex(['2022-01-01', '2022-02-01', '2022-03-01', '2022-04-01',
'2022-05-01', '2022-06-01', '2022-07-01', '2022-08-01',
'2022-09-01', '2022-10-01', '2022-11-01', '2022-12-01'],
dtype='datetime64[ns]', freq='MS')
pd.date_range(start='2022-01-01', end='2022-12-31', freq='M')
DatetimeIndex(['2022-01-31', '2022-02-28', '2022-03-31', '2022-04-30',
'2022-05-31', '2022-06-30', '2022-07-31', '2022-08-31',
'2022-09-30', '2022-10-31', '2022-11-30', '2022-12-31'],
dtype='datetime64[ns]', freq='M')
pd.date_range(start='2022-01-01', end='2022-01-31', freq='B')
DatetimeIndex(['2022-01-03', '2022-01-04', '2022-01-05', '2022-01-06',
'2022-01-07', '2022-01-10', '2022-01-11', '2022-01-12',
'2022-01-13', '2022-01-14', '2022-01-17', '2022-01-18',
'2022-01-19', '2022-01-20', '2022-01-21', '2022-01-24',
'2022-01-25', '2022-01-26', '2022-01-27', '2022-01-28',
'2022-01-31'],
dtype='datetime64[ns]', freq='B')
As we’ve seen above, you can pass in a Pandas offset alias to the freq
parameter to change it from using the default D
value which returns days. There are a huge range of offset aliases that support almost every type of date imaginable. Of course, these will only work in they are in the interval or period defined.
Alias | Description |
---|---|
B |
The B offset returns data using a business day frequency. |
C |
The C offset returns data using a custom business day frequency. |
D |
The D offset returns data using a daily frequency. |
W |
The W offset returns data using a weekly frequency. |
M |
The M offset returns data using a month end frequency. |
SM |
The SM offset returns data using a semi-month end frequency. |
BM |
The BM offset returns data using a business month end frequency. |
CBM |
The CBM offset returns data using a custom business month end frequency. |
MS |
The MS offset returns data using a month start frequency. |
SMS |
The SMS offset returns data using a semi month start frequency. |
BMS |
The BMS offset returns data using a business month start frequency. |
CBMS |
The CBMS offset returns data using a custom business month start frequency. |
Q |
The Q offset returns data using a quarter end frequency. |
BQ |
The BQ offset returns data using a business quarter end frequency. |
QS |
The QS offset returns data using a quarter start frequency. |
BQS |
The BQS offset returns data using a business quarter start frequency. |
A |
The A offset returns data using a year end frequency. |
Y |
The Y offset returns data using a year end frequency. |
BA |
The BA offset returns data using a business year end frequency. |
BY |
The BY offset returns data using a business year end frequency. |
AS |
The AS offset returns data using a year start frequency. |
YS |
The YS offset returns data using a year start frequency. |
BAS |
The BAS offset returns data using a business year start frequency. |
BYS |
The BYS offset returns data using a business year start frequency. |
H |
The H offset returns data using an hourly frequency. |
T |
The T offset returns data using a minute frequency. |
min |
The min offset returns data using a minute frequency. |
S |
The S offset returns data using a second frequency. |
L |
The L offset returns data using a millisecond frequency. |
ms |
The ms offset returns data using a millisecond frequency. |
U |
The U offset returns data using a microsecond frequency. |
us |
The us offset returns data using a microsecond frequency. |
N |
The N offset returns data using a nanosecond frequency. |
The periods
argument is used to get a number of dates after a given start
date. This is very useful for creating a future dataframe of dates in a forthcoming period when creating a time series forecast model. In the example below, we’ll create a future dataframe containing all the dates for the next 90 days after 2022-01-01.
pd.date_range(start='2022-01-01', periods=90)
DatetimeIndex(['2022-01-01', '2022-01-02', '2022-01-03', '2022-01-04',
'2022-01-05', '2022-01-06', '2022-01-07', '2022-01-08',
'2022-01-09', '2022-01-10', '2022-01-11', '2022-01-12',
'2022-01-13', '2022-01-14', '2022-01-15', '2022-01-16',
'2022-01-17', '2022-01-18', '2022-01-19', '2022-01-20',
'2022-01-21', '2022-01-22', '2022-01-23', '2022-01-24',
'2022-01-25', '2022-01-26', '2022-01-27', '2022-01-28',
'2022-01-29', '2022-01-30', '2022-01-31', '2022-02-01',
'2022-02-02', '2022-02-03', '2022-02-04', '2022-02-05',
'2022-02-06', '2022-02-07', '2022-02-08', '2022-02-09',
'2022-02-10', '2022-02-11', '2022-02-12', '2022-02-13',
'2022-02-14', '2022-02-15', '2022-02-16', '2022-02-17',
'2022-02-18', '2022-02-19', '2022-02-20', '2022-02-21',
'2022-02-22', '2022-02-23', '2022-02-24', '2022-02-25',
'2022-02-26', '2022-02-27', '2022-02-28', '2022-03-01',
'2022-03-02', '2022-03-03', '2022-03-04', '2022-03-05',
'2022-03-06', '2022-03-07', '2022-03-08', '2022-03-09',
'2022-03-10', '2022-03-11', '2022-03-12', '2022-03-13',
'2022-03-14', '2022-03-15', '2022-03-16', '2022-03-17',
'2022-03-18', '2022-03-19', '2022-03-20', '2022-03-21',
'2022-03-22', '2022-03-23', '2022-03-24', '2022-03-25',
'2022-03-26', '2022-03-27', '2022-03-28', '2022-03-29',
'2022-03-30', '2022-03-31'],
dtype='datetime64[ns]', freq='D')
To create a dataframe of all dates after a date for a given number of days, use the following code:
df = pd.DataFrame(pd.date_range(start='2023-01-01', periods=90), columns=['date'])
df.head(3)
date | |
---|---|
0 | 2023-01-01 |
1 | 2023-01-02 |
2 | 2023-01-03 |
df.tail(3)
date | |
---|---|
87 | 2023-03-29 |
88 | 2023-03-30 |
89 | 2023-03-31 |
You can, of course, combine the periods
argument with freq
and a custom offset to create a DatetimeIndex
of datetimes that occur every X frequency over X periods. For example, pd.date_range('2022-01-01', periods=24, freq="1H")
will create a series of datetime stamps for every hour of the day on a single date.
pd.date_range('2022-01-01', periods=24, freq="1H")
DatetimeIndex(['2022-01-01 00:00:00', '2022-01-01 01:00:00',
'2022-01-01 02:00:00', '2022-01-01 03:00:00',
'2022-01-01 04:00:00', '2022-01-01 05:00:00',
'2022-01-01 06:00:00', '2022-01-01 07:00:00',
'2022-01-01 08:00:00', '2022-01-01 09:00:00',
'2022-01-01 10:00:00', '2022-01-01 11:00:00',
'2022-01-01 12:00:00', '2022-01-01 13:00:00',
'2022-01-01 14:00:00', '2022-01-01 15:00:00',
'2022-01-01 16:00:00', '2022-01-01 17:00:00',
'2022-01-01 18:00:00', '2022-01-01 19:00:00',
'2022-01-01 20:00:00', '2022-01-01 21:00:00',
'2022-01-01 22:00:00', '2022-01-01 23:00:00'],
dtype='datetime64[ns]', freq='H')
To get the first day of the month at three month intervals over four periods we can use pd.date_range('2022-01-01', periods=4, freq="3MS")
.
pd.date_range('2022-01-01', periods=4, freq="3MS")
DatetimeIndex(['2022-01-01', '2022-04-01', '2022-07-01', '2022-10-01'], dtype='datetime64[ns]', freq='3MS')
The tz
parameter can be used to get datetimes for a timezone. The timezone can be specified as a string or as a pytz
timezone object. In the examples below, we’ll fetch datetimes for the Europe/London
timezone and the Asia/Kolkata
timezone.
pd.date_range(start='2022-01-01', end='2022-01-31', tz='Europe/London')
DatetimeIndex(['2022-01-01 00:00:00+00:00', '2022-01-02 00:00:00+00:00',
'2022-01-03 00:00:00+00:00', '2022-01-04 00:00:00+00:00',
'2022-01-05 00:00:00+00:00', '2022-01-06 00:00:00+00:00',
'2022-01-07 00:00:00+00:00', '2022-01-08 00:00:00+00:00',
'2022-01-09 00:00:00+00:00', '2022-01-10 00:00:00+00:00',
'2022-01-11 00:00:00+00:00', '2022-01-12 00:00:00+00:00',
'2022-01-13 00:00:00+00:00', '2022-01-14 00:00:00+00:00',
'2022-01-15 00:00:00+00:00', '2022-01-16 00:00:00+00:00',
'2022-01-17 00:00:00+00:00', '2022-01-18 00:00:00+00:00',
'2022-01-19 00:00:00+00:00', '2022-01-20 00:00:00+00:00',
'2022-01-21 00:00:00+00:00', '2022-01-22 00:00:00+00:00',
'2022-01-23 00:00:00+00:00', '2022-01-24 00:00:00+00:00',
'2022-01-25 00:00:00+00:00', '2022-01-26 00:00:00+00:00',
'2022-01-27 00:00:00+00:00', '2022-01-28 00:00:00+00:00',
'2022-01-29 00:00:00+00:00', '2022-01-30 00:00:00+00:00',
'2022-01-31 00:00:00+00:00'],
dtype='datetime64[ns, Europe/London]', freq='D')
pd.date_range(start='2022-01-01', end='2022-01-31', tz='Asia/Kolkata')
DatetimeIndex(['2022-01-01 00:00:00+05:30', '2022-01-02 00:00:00+05:30',
'2022-01-03 00:00:00+05:30', '2022-01-04 00:00:00+05:30',
'2022-01-05 00:00:00+05:30', '2022-01-06 00:00:00+05:30',
'2022-01-07 00:00:00+05:30', '2022-01-08 00:00:00+05:30',
'2022-01-09 00:00:00+05:30', '2022-01-10 00:00:00+05:30',
'2022-01-11 00:00:00+05:30', '2022-01-12 00:00:00+05:30',
'2022-01-13 00:00:00+05:30', '2022-01-14 00:00:00+05:30',
'2022-01-15 00:00:00+05:30', '2022-01-16 00:00:00+05:30',
'2022-01-17 00:00:00+05:30', '2022-01-18 00:00:00+05:30',
'2022-01-19 00:00:00+05:30', '2022-01-20 00:00:00+05:30',
'2022-01-21 00:00:00+05:30', '2022-01-22 00:00:00+05:30',
'2022-01-23 00:00:00+05:30', '2022-01-24 00:00:00+05:30',
'2022-01-25 00:00:00+05:30', '2022-01-26 00:00:00+05:30',
'2022-01-27 00:00:00+05:30', '2022-01-28 00:00:00+05:30',
'2022-01-29 00:00:00+05:30', '2022-01-30 00:00:00+05:30',
'2022-01-31 00:00:00+05:30'],
dtype='datetime64[ns, Asia/Kolkata]', freq='D')
Let’s say you wanted to create a dataframe that contains the date of the payday weekend in each month, where that’s the fourth Friday, Saturday, and Sunday of each month. To do this, you can use a more advanced Pandas frequency feature called an anchored offset.
We’ll call the date_range()
function and use WOM-4FRI
, WOM-4SAT
, and WOM-4SUN
to define the periodicity, and combine these with union()
to return a single DatetimeIndex
from which we can create a Pandas dataframe. Note that the WOM
offset shown refers to WeekOfMonth
and isn’t actually covered in the Pandas documentation.
df = pd.DataFrame(pd.date_range("2022-01-01", periods=12, freq='WOM-4FRI')
.union(pd.date_range("2022-01-01", periods=12, freq='WOM-4SAT')
.union(pd.date_range("2022-01-01", periods=12, freq='WOM-4SUN'))
), columns=['date'])
df
date | |
---|---|
0 | 2022-01-22 |
1 | 2022-01-23 |
2 | 2022-01-28 |
3 | 2022-02-25 |
4 | 2022-02-26 |
5 | 2022-02-27 |
6 | 2022-03-25 |
7 | 2022-03-26 |
8 | 2022-03-27 |
9 | 2022-04-22 |
10 | 2022-04-23 |
11 | 2022-04-24 |
12 | 2022-05-22 |
13 | 2022-05-27 |
14 | 2022-05-28 |
15 | 2022-06-24 |
16 | 2022-06-25 |
17 | 2022-06-26 |
18 | 2022-07-22 |
19 | 2022-07-23 |
20 | 2022-07-24 |
21 | 2022-08-26 |
22 | 2022-08-27 |
23 | 2022-08-28 |
24 | 2022-09-23 |
25 | 2022-09-24 |
26 | 2022-09-25 |
27 | 2022-10-22 |
28 | 2022-10-23 |
29 | 2022-10-28 |
30 | 2022-11-25 |
31 | 2022-11-26 |
32 | 2022-11-27 |
33 | 2022-12-23 |
34 | 2022-12-24 |
35 | 2022-12-25 |
Matt Clarke, Sunday, November 13, 2022