How to use Pandas CategoricalDtype to create custom sort orders

Learn how to use the Pandas CategoricalDtype to create custom sort orders for weekdays, month names, and sizes, so you can sort them correctly.

How to use Pandas CategoricalDtype to create custom sort orders
Picture by Ron Lach, Pexels.
10 minutes to read

When working with Pandas, you’ll often need to sort a dataframe by one or more columns. While the Pandas sort_values() method makes it easy to sort categorical data in alphabetical order or reverse alphabetical order, and numeric values in ascending or descending order, sometimes values need their own custom sort order because they’re not in alphabetical or numeric order.

For example, weekdays and month names are not in alphabetical order, and sizes are not in numeric order. So, if you want to sort a dataframe by weekdays, month names, or sizes, you need to create a custom sort order. In the past, I’ve done this using map(), but there is a far better way - the Pandas CategoricalDtype().

The CategoricalDtype is a Pandas data type that specifies categorical data, but it also allows you to specify a custom sort order. This means you can use the CategoricalDtype to sort categorical data in any order you want, not just alphabetical or numeric order. In this post, you’ll learn how to use the Pandas CategoricalDtype to create custom sort orders for weekdays, month names, and sizes, so you can sort them correctly.

Import the packages

To get started, open a Jupyter notebook and import the Pandas package and the CategoricalDtype from the Pandas types module.

import pandas as pd
from pandas.api.types import CategoricalDtype

Create a custom sort order for weekdays

First, we’ll create a custom sort order for weekdays. To show what happens without a custom sort order, let’s create a dataframe with a column of weekdays and use sort_values() to sort the dataframe by the weekdays column. As you can see, the weekdays are sorted in alphabetical order, which is not the correct order in which days occur…

df = pd.DataFrame.from_dict({
    "day": ['Monday', 'Sunday', 'Wednesday', 'Friday',
            'Tuesday', 'Thursday', 'Saturday', ],
})
df.sort_values(by='day')
day
3 Friday
0 Monday
6 Saturday
1 Sunday
5 Thursday
4 Tuesday
2 Wednesday

To solve the problem we’ll pass a list of weekday names to CategoricalDtype() in the order we want them to be sorted along with the ordered=True argument. Then, we’ll use the astype() method to convert the day column to the CategoricalDtype we created. Finally, we’ll use sort_values() to sort the dataframe by the day column, and we’ll see that the weekdays are now sorted in the correct order.

weekday = CategoricalDtype(['Monday', 'Tuesday', 'Wednesday', 
                            'Thursday', 'Friday', 'Saturday', 'Sunday'], 
                           ordered=True)
df['day'] = df['day'].astype(weekday)
df.sort_values(by='day')
day
0 Monday
4 Tuesday
2 Wednesday
5 Thursday
3 Friday
6 Saturday
1 Sunday

Create a custom sort order for month names

The other very common categorical data type that requires a custom sort order are month names. Let’s create a Pandas dataframe of month names in random order and then see again what happens with the default sort order of sort_values(), which places them in alphabetical order.

df = pd.DataFrame.from_dict({
    "month": ['January', 'October', 'November', 'February', 
              'May', 'June','March', 'April', 
              'July', 'August', 'September', 'December'],
})
df.sort_values(by='month')
month
7 April
9 August
11 December
3 February
0 January
8 July
5 June
6 March
4 May
2 November
1 October
10 September

To create a custom sort order for month names, we can simply pass our list of month names to CategoricalDtype() in the order we want them to be sorted along with the ordered=True argument. Then, we’ll use the astype() method to convert the month column to the CategoricalDtype we created. Finally, we’ll use sort_values() to sort the dataframe by the month column, and we’ll see that the months are now sorted in the correct order.

month = CategoricalDtype(['January', 'February', 'March', 'April', 
                          'May', 'June', 'July', 'August', 
                          'September', 'October', 'November', 'December'], 
                         ordered=True)
df['month'] = df['month'].astype(month)
df.sort_values(by='month')
month
0 January
3 February
6 March
7 April
4 May
5 June
8 July
9 August
10 September
1 October
2 November
11 December

Create a custom sort order for sizes

Product, garment or clothing sizes are another common categorical data type that requires a custom sort order. Let’s create a Pandas dataframe of sizes in random order and then see again what happens with the default sort order of sort_values(), which places them in alphabetical order.

df = pd.DataFrame.from_dict({
    "size": ['L', 'XL', 'XS', 'S', 'M', 'XXL']
})
df.sort_values(by='size')
size
0 L
4 M
3 S
1 XL
2 XS
5 XXL

To create a custom sort order for product sizes, we simply pass a list of sizes placed in the correct order to CategoricalDtype() along with the ordered=True argument. Then, we’ll use the astype() method to convert the size column to the CategoricalDtype we created. Finally, we’ll use sort_values() to sort the dataframe by the size column, and we’ll see that the sizes are now sorted in the correct order.

size = CategoricalDtype(['XS', 'S', 'M', 'L', 'XL', 'XXL'], ordered=True)
df[''] = df['size'].astype(size)
df.sort_values(by='size')
size
0 XS
4 S
3 M
1 L
2 XL
5 XXL

Matt Clarke, Sunday, November 06, 2022

Matt Clarke Matt is an Ecommerce and Marketing Director who uses data science to help in his work. Matt has a Master's degree in Internet Retailing (plus two other Master's degrees in different fields) and specialises in the technical side of ecommerce and marketing.