How to use Pandas date_range() to create date ranges

Picture by Aphiwat Chuangchoem, Pexels.

20 minutes to read

Data Science Pandas

Pandas includes some incredible features for working with dates and times. The Pandas date_range() function is used to create a range of dates and can be used to create a list of dates in a range, or a dataframe of dates within a period. The date_range() function takes the following parameters and returns a fixed frequency DatetimeIndex, from which a Pandas dataframe of dates, or list of dates, can be generated.

pandas.date_range(start=None, 
                  end=None, 
                  periods=None, 
                  freq=None, 
                  tz=None, 
                  normalize=False, 
                  name=None, 
                  closed=NoDefault.no_default, 
                  inclusive=None, 
                  **kwargs)

Parameter	Description
`start`	The `start` parameter is an optional string or datetime-like value that represents the left-bound or start date when generating a list of dates in a date range.
`end`	The `end` parameter is an optional string or datetime-like value that represents the right-bound or end date when generating a list of dates in a date range.
`periods`	The `periods` parameter is an optional integer that represents the number of periods to generate in a date range.
`freq`	The `freq` parameter is an optional string or Pandas offset object that represents the frequency of the date range. This defaults to D for days, but you can also use multiples, so 12H would generate datetimes every 12 hours. It's quite powerful.
`tz`	The `tz` parameter is an optional string or timezone object that represents the timezone of the date range.
`normalize`	The `normalize` parameter is an optional boolean that represents whether or not to normalize the start and end datetimes in the date range to midnight.
`name`	The `name` parameter is an optional string that represents the name of the date range.
`closed`	The `closed` parameter is an optional string that represents whether the date range is left-inclusive or right-inclusive. This defaults to None, but you can also use left or right. This was deprecated in Pandas 1.4.0.
`inclusive`	The `inclusive` parameter is an optional value that can be set to both, neither, left, or right (and defaults to both). It's used to define whether to include boundaries or not.
`kwargs`	The `kwargs` parameter is an optional dictionary of keyword arguments that are passed to the underlying date range function.

The Pandas date_range() function has loads of uses in data science projects, especially time series data analysis, time series forecasting, and projects involving financial data. It can be used to create arrays of dates, lists of dates, and Pandas dataframes. In this project, we’ll go over some examples of how you can use date_range() to perform various data science tasks.

Import the packages

To get started, open a new Jupyter notebook and import the Pandas package using the import pandas as pd convention.

import pandas as pd

Create a DatetimeIndex of dates within a range

To get started, we’ll first use date_range() to create a DatetimeIndex containing all the dates in a date range between two dates, identified by the start and end parameters. By default, this sets the freq argument to D for days, but you can also change this to return a range of other date and time types.

pd.date_range(start='2022-01-01', end='2022-01-31')

DatetimeIndex(['2022-01-01', '2022-01-02', '2022-01-03', '2022-01-04',
               '2022-01-05', '2022-01-06', '2022-01-07', '2022-01-08',
               '2022-01-09', '2022-01-10', '2022-01-11', '2022-01-12',
               '2022-01-13', '2022-01-14', '2022-01-15', '2022-01-16',
               '2022-01-17', '2022-01-18', '2022-01-19', '2022-01-20',
               '2022-01-21', '2022-01-22', '2022-01-23', '2022-01-24',
               '2022-01-25', '2022-01-26', '2022-01-27', '2022-01-28',
               '2022-01-29', '2022-01-30', '2022-01-31'],
              dtype='datetime64[ns]', freq='D')

Create a dataframe of dates within a range

By default, date_range() returns a DatetimeIndex of dates, however, in many cases you’ll actually want a Pandas dataframe of dates to be returned. To return a Pandas dataframe instead of a DatetimeIndex you can pass the original pd.date_range(start='2022-01-01', end='2022-01-31') code to pd.DataFrame() to create a new dataframe containing the dates. If you pass columns=['date'] you can also define the column name in which the date range is stored.

df = pd.DataFrame(pd.date_range(start='2022-01-01', end='2022-01-31'), columns=['date'])
df.head(3)

	date
0	2022-01-01
1	2022-01-02
2	2022-01-03

df.tail(3)

	date
28	2022-01-29
29	2022-01-30
30	2022-01-31

Create a list of dates within a date range

To get date_range() to return a Python list of dates instead, we can append the Pandas tolist() function to the pd.date_range(start='2022-01-01', end='2022-01-31') code. We’ll assign the output to a variable called dates_list and print the first three values using dates_list[:3].

dates_list = pd.date_range(start='2022-01-01', end='2022-01-31').tolist()
dates_list[:3]

[Timestamp('2022-01-01 00:00:00', freq='D'),
 Timestamp('2022-01-02 00:00:00', freq='D'),
 Timestamp('2022-01-03 00:00:00', freq='D')]

Now that we’ve covered how you can export date ranges as a DatetimeIndex, a dataframe, or a list, we’ll look at some of the other date_range() parameters you can use to control the export of date ranges created by the function.

Use the freq parameter to specify the date frequency

By defining the freq parameter we can specify the frequency of the date range. The freq argument takes a Pandas frequency string. There are loads of these and you can use them to calculate pretty much every kind of date. The default is D for daily, and this will return a date for each day within the date range defined. We can also use other values, such as W for weekly, M for monthly, Q for quarterly, and A for annual. Let’s create a DatetimeIndex that contains the start date for each week within a date range.

Get all dates by week frequency

pd.date_range(start='2022-01-01', end='2022-01-31', freq='W')

DatetimeIndex(['2022-01-02', '2022-01-09', '2022-01-16', '2022-01-23',
               '2022-01-30'],
              dtype='datetime64[ns]', freq='W-SUN')

Get all dates by month start frequency

pd.date_range(start='2022-01-01', end='2022-12-31', freq='MS')

DatetimeIndex(['2022-01-01', '2022-02-01', '2022-03-01', '2022-04-01',
               '2022-05-01', '2022-06-01', '2022-07-01', '2022-08-01',
               '2022-09-01', '2022-10-01', '2022-11-01', '2022-12-01'],
              dtype='datetime64[ns]', freq='MS')

Get all dates by month end frequency

pd.date_range(start='2022-01-01', end='2022-12-31', freq='M')

DatetimeIndex(['2022-01-31', '2022-02-28', '2022-03-31', '2022-04-30',
               '2022-05-31', '2022-06-30', '2022-07-31', '2022-08-31',
               '2022-09-30', '2022-10-31', '2022-11-30', '2022-12-31'],
              dtype='datetime64[ns]', freq='M')

Get all dates by business day frequency

pd.date_range(start='2022-01-01', end='2022-01-31', freq='B')

DatetimeIndex(['2022-01-03', '2022-01-04', '2022-01-05', '2022-01-06',
               '2022-01-07', '2022-01-10', '2022-01-11', '2022-01-12',
               '2022-01-13', '2022-01-14', '2022-01-17', '2022-01-18',
               '2022-01-19', '2022-01-20', '2022-01-21', '2022-01-24',
               '2022-01-25', '2022-01-26', '2022-01-27', '2022-01-28',
               '2022-01-31'],
              dtype='datetime64[ns]', freq='B')

Pandas offset aliases

As we’ve seen above, you can pass in a Pandas offset alias to the freq parameter to change it from using the default D value which returns days. There are a huge range of offset aliases that support almost every type of date imaginable. Of course, these will only work in they are in the interval or period defined.

Alias	Description
`B`	The `B` offset returns data using a business day frequency.
`C`	The `C` offset returns data using a custom business day frequency.
`D`	The `D` offset returns data using a daily frequency.
`W`	The `W` offset returns data using a weekly frequency.
`M`	The `M` offset returns data using a month end frequency.
`SM`	The `SM` offset returns data using a semi-month end frequency.
`BM`	The `BM` offset returns data using a business month end frequency.
`CBM`	The `CBM` offset returns data using a custom business month end frequency.
`MS`	The `MS` offset returns data using a month start frequency.
`SMS`	The `SMS` offset returns data using a semi month start frequency.
`BMS`	The `BMS` offset returns data using a business month start frequency.
`CBMS`	The `CBMS` offset returns data using a custom business month start frequency.
`Q`	The `Q` offset returns data using a quarter end frequency.
`BQ`	The `BQ` offset returns data using a business quarter end frequency.
`QS`	The `QS` offset returns data using a quarter start frequency.
`BQS`	The `BQS` offset returns data using a business quarter start frequency.
`A`	The `A` offset returns data using a year end frequency.
`Y`	The `Y` offset returns data using a year end frequency.
`BA`	The `BA` offset returns data using a business year end frequency.
`BY`	The `BY` offset returns data using a business year end frequency.
`AS`	The `AS` offset returns data using a year start frequency.
`YS`	The `YS` offset returns data using a year start frequency.
`BAS`	The `BAS` offset returns data using a business year start frequency.
`BYS`	The `BYS` offset returns data using a business year start frequency.
`H`	The `H` offset returns data using an hourly frequency.
`T`	The `T` offset returns data using a minute frequency.
`min`	The `min` offset returns data using a minute frequency.
`S`	The `S` offset returns data using a second frequency.
`L`	The `L` offset returns data using a millisecond frequency.
`ms`	The `ms` offset returns data using a millisecond frequency.
`U`	The `U` offset returns data using a microsecond frequency.
`us`	The `us` offset returns data using a microsecond frequency.
`N`	The `N` offset returns data using a nanosecond frequency.

Use periods to get a number of dates

The periods argument is used to get a number of dates after a given start date. This is very useful for creating a future dataframe of dates in a forthcoming period when creating a time series forecast model. In the example below, we’ll create a future dataframe containing all the dates for the next 90 days after 2022-01-01.

pd.date_range(start='2022-01-01', periods=90)

DatetimeIndex(['2022-01-01', '2022-01-02', '2022-01-03', '2022-01-04',
               '2022-01-05', '2022-01-06', '2022-01-07', '2022-01-08',
               '2022-01-09', '2022-01-10', '2022-01-11', '2022-01-12',
               '2022-01-13', '2022-01-14', '2022-01-15', '2022-01-16',
               '2022-01-17', '2022-01-18', '2022-01-19', '2022-01-20',
               '2022-01-21', '2022-01-22', '2022-01-23', '2022-01-24',
               '2022-01-25', '2022-01-26', '2022-01-27', '2022-01-28',
               '2022-01-29', '2022-01-30', '2022-01-31', '2022-02-01',
               '2022-02-02', '2022-02-03', '2022-02-04', '2022-02-05',
               '2022-02-06', '2022-02-07', '2022-02-08', '2022-02-09',
               '2022-02-10', '2022-02-11', '2022-02-12', '2022-02-13',
               '2022-02-14', '2022-02-15', '2022-02-16', '2022-02-17',
               '2022-02-18', '2022-02-19', '2022-02-20', '2022-02-21',
               '2022-02-22', '2022-02-23', '2022-02-24', '2022-02-25',
               '2022-02-26', '2022-02-27', '2022-02-28', '2022-03-01',
               '2022-03-02', '2022-03-03', '2022-03-04', '2022-03-05',
               '2022-03-06', '2022-03-07', '2022-03-08', '2022-03-09',
               '2022-03-10', '2022-03-11', '2022-03-12', '2022-03-13',
               '2022-03-14', '2022-03-15', '2022-03-16', '2022-03-17',
               '2022-03-18', '2022-03-19', '2022-03-20', '2022-03-21',
               '2022-03-22', '2022-03-23', '2022-03-24', '2022-03-25',
               '2022-03-26', '2022-03-27', '2022-03-28', '2022-03-29',
               '2022-03-30', '2022-03-31'],
              dtype='datetime64[ns]', freq='D')

To create a dataframe of all dates after a date for a given number of days, use the following code:

df = pd.DataFrame(pd.date_range(start='2023-01-01', periods=90), columns=['date'])
df.head(3)

	date
0	2023-01-01
1	2023-01-02
2	2023-01-03

df.tail(3)

	date
87	2023-03-29
88	2023-03-30
89	2023-03-31

Use periods and freq to create dates every X frequency

You can, of course, combine the periods argument with freq and a custom offset to create a DatetimeIndex of datetimes that occur every X frequency over X periods. For example, pd.date_range('2022-01-01', periods=24, freq="1H") will create a series of datetime stamps for every hour of the day on a single date.

pd.date_range('2022-01-01', periods=24, freq="1H")

DatetimeIndex(['2022-01-01 00:00:00', '2022-01-01 01:00:00',
               '2022-01-01 02:00:00', '2022-01-01 03:00:00',
               '2022-01-01 04:00:00', '2022-01-01 05:00:00',
               '2022-01-01 06:00:00', '2022-01-01 07:00:00',
               '2022-01-01 08:00:00', '2022-01-01 09:00:00',
               '2022-01-01 10:00:00', '2022-01-01 11:00:00',
               '2022-01-01 12:00:00', '2022-01-01 13:00:00',
               '2022-01-01 14:00:00', '2022-01-01 15:00:00',
               '2022-01-01 16:00:00', '2022-01-01 17:00:00',
               '2022-01-01 18:00:00', '2022-01-01 19:00:00',
               '2022-01-01 20:00:00', '2022-01-01 21:00:00',
               '2022-01-01 22:00:00', '2022-01-01 23:00:00'],
              dtype='datetime64[ns]', freq='H')

To get the first day of the month at three month intervals over four periods we can use pd.date_range('2022-01-01', periods=4, freq="3MS").

pd.date_range('2022-01-01', periods=4, freq="3MS")

DatetimeIndex(['2022-01-01', '2022-04-01', '2022-07-01', '2022-10-01'], dtype='datetime64[ns]', freq='3MS')

Use tz to get datetimes for a timezone

The tz parameter can be used to get datetimes for a timezone. The timezone can be specified as a string or as a pytz timezone object. In the examples below, we’ll fetch datetimes for the Europe/London timezone and the Asia/Kolkata timezone.

pd.date_range(start='2022-01-01', end='2022-01-31', tz='Europe/London')

DatetimeIndex(['2022-01-01 00:00:00+00:00', '2022-01-02 00:00:00+00:00',
               '2022-01-03 00:00:00+00:00', '2022-01-04 00:00:00+00:00',
               '2022-01-05 00:00:00+00:00', '2022-01-06 00:00:00+00:00',
               '2022-01-07 00:00:00+00:00', '2022-01-08 00:00:00+00:00',
               '2022-01-09 00:00:00+00:00', '2022-01-10 00:00:00+00:00',
               '2022-01-11 00:00:00+00:00', '2022-01-12 00:00:00+00:00',
               '2022-01-13 00:00:00+00:00', '2022-01-14 00:00:00+00:00',
               '2022-01-15 00:00:00+00:00', '2022-01-16 00:00:00+00:00',
               '2022-01-17 00:00:00+00:00', '2022-01-18 00:00:00+00:00',
               '2022-01-19 00:00:00+00:00', '2022-01-20 00:00:00+00:00',
               '2022-01-21 00:00:00+00:00', '2022-01-22 00:00:00+00:00',
               '2022-01-23 00:00:00+00:00', '2022-01-24 00:00:00+00:00',
               '2022-01-25 00:00:00+00:00', '2022-01-26 00:00:00+00:00',
               '2022-01-27 00:00:00+00:00', '2022-01-28 00:00:00+00:00',
               '2022-01-29 00:00:00+00:00', '2022-01-30 00:00:00+00:00',
               '2022-01-31 00:00:00+00:00'],
              dtype='datetime64[ns, Europe/London]', freq='D')

pd.date_range(start='2022-01-01', end='2022-01-31', tz='Asia/Kolkata')

DatetimeIndex(['2022-01-01 00:00:00+05:30', '2022-01-02 00:00:00+05:30',
               '2022-01-03 00:00:00+05:30', '2022-01-04 00:00:00+05:30',
               '2022-01-05 00:00:00+05:30', '2022-01-06 00:00:00+05:30',
               '2022-01-07 00:00:00+05:30', '2022-01-08 00:00:00+05:30',
               '2022-01-09 00:00:00+05:30', '2022-01-10 00:00:00+05:30',
               '2022-01-11 00:00:00+05:30', '2022-01-12 00:00:00+05:30',
               '2022-01-13 00:00:00+05:30', '2022-01-14 00:00:00+05:30',
               '2022-01-15 00:00:00+05:30', '2022-01-16 00:00:00+05:30',
               '2022-01-17 00:00:00+05:30', '2022-01-18 00:00:00+05:30',
               '2022-01-19 00:00:00+05:30', '2022-01-20 00:00:00+05:30',
               '2022-01-21 00:00:00+05:30', '2022-01-22 00:00:00+05:30',
               '2022-01-23 00:00:00+05:30', '2022-01-24 00:00:00+05:30',
               '2022-01-25 00:00:00+05:30', '2022-01-26 00:00:00+05:30',
               '2022-01-27 00:00:00+05:30', '2022-01-28 00:00:00+05:30',
               '2022-01-29 00:00:00+05:30', '2022-01-30 00:00:00+05:30',
               '2022-01-31 00:00:00+05:30'],
              dtype='datetime64[ns, Asia/Kolkata]', freq='D')

Use union() to combine different date ranges with anchored offsets

Let’s say you wanted to create a dataframe that contains the date of the payday weekend in each month, where that’s the fourth Friday, Saturday, and Sunday of each month. To do this, you can use a more advanced Pandas frequency feature called an anchored offset.

We’ll call the date_range() function and use WOM-4FRI, WOM-4SAT, and WOM-4SUN to define the periodicity, and combine these with union() to return a single DatetimeIndex from which we can create a Pandas dataframe. Note that the WOM offset shown refers to WeekOfMonth and isn’t actually covered in the Pandas documentation.

df = pd.DataFrame(pd.date_range("2022-01-01", periods=12, freq='WOM-4FRI')
                  .union(pd.date_range("2022-01-01", periods=12, freq='WOM-4SAT')
                  .union(pd.date_range("2022-01-01", periods=12, freq='WOM-4SUN'))
                  ), columns=['date'])
df

	date
0	2022-01-22
1	2022-01-23
2	2022-01-28
3	2022-02-25
4	2022-02-26
5	2022-02-27
6	2022-03-25
7	2022-03-26
8	2022-03-27
9	2022-04-22
10	2022-04-23
11	2022-04-24
12	2022-05-22
13	2022-05-27
14	2022-05-28
15	2022-06-24
16	2022-06-25
17	2022-06-26
18	2022-07-22
19	2022-07-23
20	2022-07-24
21	2022-08-26
22	2022-08-27
23	2022-08-28
24	2022-09-23
25	2022-09-24
26	2022-09-25
27	2022-10-22
28	2022-10-23
29	2022-10-28
30	2022-11-25
31	2022-11-26
32	2022-11-27
33	2022-12-23
34	2022-12-24
35	2022-12-25

Matt Clarke, Sunday, November 13, 2022

Matt Clarke Matt is an Ecommerce and Marketing Director who uses data science to help in his work. Matt has a Master's degree in Internet Retailing (plus two other Master's degrees in different fields) and specialises in the technical side of ecommerce and marketing.