How to use sort_values() to sort a Pandas DataFrame

Learn how to use the sort_values() and sort_index() methods to sort a Pandas DataFrame by one or more columns, and even create custom sort orders for categorical data.

How to use sort_values() to sort a Pandas DataFrame
Picture by Pramod Tiwari, Pexels.
48 minutes to read

When working with Pandas dataframes you’ll commonly need to sort the data in some way. This is easy to do with the sort_values() and sort_index() methods. These two methods allow you to sort in ascending or descending order, alphabetical or reverse alphabetical order, and using times, datetimes, or even custom sort orders of your own creation. In this post, we’ll go through the various features of the sort_values() function, so you can use it to sort your data in the way you want.

The sort_values() method

The sort_values() is the most commonly used way to sort data in Pandas dataframes. At its most basic, it can be used to control when data are ordered in ascending or descending order. For example, if you wanted to sort the sales column in ascending order, however, the function also includes some more advanced parameters that allow you to do some much more sophisticated sorting. Here’s a summary of the parameters you can use with the sort_values() method and what they do. We’ll work through each one in the next steps.

Parameter Description
by The by parameter takes a string or list of strings that denote the columns to sort the dataframe by. For example, df.sort_values(by='PageViews') will sort a dataframe by the PageViews column, while df.sort_values(by=['PageViews', 'Events']) will sort by both PageViews and Events.
ascending The ascending parameter is optional and defaults to True so when you don't pass the parameter the dataframe will be sorted in ascending order of the defined column. To sort the dataframe values in descending order you need to pass the argument ascending=False.
kind The optional kind parameter defines the sorting algorithm Pandas uses to sort the column. By default, this is set to quicksort, but you can also set kind to mergesort, heapsort, or stable if you prefer. The numpy.sort() documentation explains how these sorting algorithms work.
inplace The optional inplace parameter is set to False by default and allows you to sort a dataframe without saving the changes back to the original. If you set this to inplace=True, the sort order you set will automatically be saved back to the source dataframe without you needing to redeclare it back to the dataframe's variable name.
na_position The optional na_position parameter is set to last by default and puts NaN values at the bottom of the sort order. If you want NaN values to appear at the top of the sort you can specify na_position='first'.
ignore_index The optional ignore_index parameter is set to False by default. When you sort the dataframe this will preserve the original index numbers. If you want to renumber your index so the values represent the new sort order you can pass in the argument ignore_index=True.
key The optional key parameter lets you apply a lambda function to a column before you sort it. It's really useful for sorting monetary values, since you can remove currency formatting and cast the dtype to a numeric value to obtain an accurate sort.

Create a dataframe

To get started, open a Jupyter notebook and import the Pandas and Numpy packages, then either load up a dataframe of your own data, or create a dataframe containing different Pandas dtypes and including some missing values. These will let us work through the various methods of sorting in Pandas to see how they handle data of different kinds.

import pandas as pd
import numpy as np
df = pd.DataFrame.from_dict({
     "page": [1, 2, 3, 4, 5, 6, 7],
     "day": ['Monday', 'Wednesday', 'Tuesday', 'Sunday', 'Thursday', 'Saturday', 'Friday'],
     "users": [34, 34, 45, 46, np.nan, 35, 42],
     "sessions": [493, 433, np.nan, 478, 427, np.nan, 233],
     "pageviews": [837, 35, 2342, 3423, 4322, 332, 300],
     "events": [2347, 24663, 36323, 24463, 374, 322, 2563],
     "revenue": ['£342.34', '£357.35', '£753.35', '£4732.34', '£584.35', '£2824.23', '£1008.98'], 
     "score": [1, 2, 3, -3, 4, -2, -1],
     "dates": pd.date_range('2022-01-01', periods=7, freq='D'), 
     "datetimes": pd.date_range('2022-01-01 09:00', periods=7, freq='H'),
    })
df
page day users sessions pageviews events revenue score dates datetimes
0 1 Monday 34.0 493.0 837 2347 £342.34 1 2022-01-01 2022-01-01 09:00:00
1 2 Wednesday 34.0 433.0 35 24663 £357.35 2 2022-01-02 2022-01-01 10:00:00
2 3 Tuesday 45.0 NaN 2342 36323 £753.35 3 2022-01-03 2022-01-01 11:00:00
3 4 Sunday 46.0 478.0 3423 24463 £4732.34 -3 2022-01-04 2022-01-01 12:00:00
4 5 Thursday NaN 427.0 4322 374 £584.35 4 2022-01-05 2022-01-01 13:00:00
5 6 Saturday 35.0 NaN 332 322 £2824.23 -2 2022-01-06 2022-01-01 14:00:00
6 7 Friday 42.0 233.0 300 2563 £1008.98 -1 2022-01-07 2022-01-01 15:00:00



Using by to sort a dataframe by a single column

The most commonly used way of sorting a dataframe is to sort by a single column using the by parameter. This will sort the dataframe in ascending order by default. That means numeric data will be sorted from smallest to largest and string data will be sorted alphabetically. If you want to sort in descending order, you can use the ascending parameter and set it to False.

df.sort_values(by='pageviews')
page day users sessions pageviews events revenue score dates datetimes
1 2 Wednesday 34.0 433.0 35 24663 £357.35 2 2022-01-02 2022-01-01 10:00:00
6 7 Friday 42.0 233.0 300 2563 £1008.98 -1 2022-01-07 2022-01-01 15:00:00
5 6 Saturday 35.0 NaN 332 322 £2824.23 -2 2022-01-06 2022-01-01 14:00:00
0 1 Monday 34.0 493.0 837 2347 £342.34 1 2022-01-01 2022-01-01 09:00:00
2 3 Tuesday 45.0 NaN 2342 36323 £753.35 3 2022-01-03 2022-01-01 11:00:00
3 4 Sunday 46.0 478.0 3423 24463 £4732.34 -3 2022-01-04 2022-01-01 12:00:00
4 5 Thursday NaN 427.0 4322 374 £584.35 4 2022-01-05 2022-01-01 13:00:00

Using by to sort a dataframe by multiple columns

You can also sort a Pandas dataframe by multiple columns. For example, you might want to order by pageviews first and then by events. To do this you simply pass a list of column names to the by parameter. By default, Pandas will sort in ascending order. If you want to sort in descending order, you can use the ascending parameter and set it to False` for each value.

df.sort_values(by=['pageviews', 'events'])
page day users sessions pageviews events revenue score dates datetimes
1 2 Wednesday 34.0 433.0 35 24663 £357.35 2 2022-01-02 2022-01-01 10:00:00
6 7 Friday 42.0 233.0 300 2563 £1008.98 -1 2022-01-07 2022-01-01 15:00:00
5 6 Saturday 35.0 NaN 332 322 £2824.23 -2 2022-01-06 2022-01-01 14:00:00
0 1 Monday 34.0 493.0 837 2347 £342.34 1 2022-01-01 2022-01-01 09:00:00
2 3 Tuesday 45.0 NaN 2342 36323 £753.35 3 2022-01-03 2022-01-01 11:00:00
3 4 Sunday 46.0 478.0 3423 24463 £4732.34 -3 2022-01-04 2022-01-01 12:00:00
4 5 Thursday NaN 427.0 4322 374 £584.35 4 2022-01-05 2022-01-01 13:00:00

Using ascending to sort a column in ascending order

As we saw above, by default Pandas will sort in ascending order. You don’t need to specify the ascending parameter if you want to sort in ascending order, but it can make your code easier for others to read.

df.sort_values(by='pageviews', ascending=True)
page day users sessions pageviews events revenue score dates datetimes
1 2 Wednesday 34.0 433.0 35 24663 £357.35 2 2022-01-02 2022-01-01 10:00:00
6 7 Friday 42.0 233.0 300 2563 £1008.98 -1 2022-01-07 2022-01-01 15:00:00
5 6 Saturday 35.0 NaN 332 322 £2824.23 -2 2022-01-06 2022-01-01 14:00:00
0 1 Monday 34.0 493.0 837 2347 £342.34 1 2022-01-01 2022-01-01 09:00:00
2 3 Tuesday 45.0 NaN 2342 36323 £753.35 3 2022-01-03 2022-01-01 11:00:00
3 4 Sunday 46.0 478.0 3423 24463 £4732.34 -3 2022-01-04 2022-01-01 12:00:00
4 5 Thursday NaN 427.0 4322 374 £584.35 4 2022-01-05 2022-01-01 13:00:00

Using ascending to sort a column in descending order

To sort a Pandas dataframe column in descending order, you can set the ascending parameter to False.

df.sort_values(by='pageviews', ascending=False)
page day users sessions pageviews events revenue score dates datetimes
4 5 Thursday NaN 427.0 4322 374 £584.35 4 2022-01-05 2022-01-01 13:00:00
3 4 Sunday 46.0 478.0 3423 24463 £4732.34 -3 2022-01-04 2022-01-01 12:00:00
2 3 Tuesday 45.0 NaN 2342 36323 £753.35 3 2022-01-03 2022-01-01 11:00:00
0 1 Monday 34.0 493.0 837 2347 £342.34 1 2022-01-01 2022-01-01 09:00:00
5 6 Saturday 35.0 NaN 332 322 £2824.23 -2 2022-01-06 2022-01-01 14:00:00
6 7 Friday 42.0 233.0 300 2563 £1008.98 -1 2022-01-07 2022-01-01 15:00:00
1 2 Wednesday 34.0 433.0 35 24663 £357.35 2 2022-01-02 2022-01-01 10:00:00

Using ascending to sort a column in alphabetical order

If the column contains non-numeric or categorical data with an object or categorical data type, you can sort the column in alphabetical order by setting the ascending parameter to True.

df.sort_values(by='day', ascending=True)
page day users sessions pageviews events revenue score dates datetimes
6 7 Friday 42.0 233.0 300 2563 £1008.98 -1 2022-01-07 2022-01-01 15:00:00
0 1 Monday 34.0 493.0 837 2347 £342.34 1 2022-01-01 2022-01-01 09:00:00
5 6 Saturday 35.0 NaN 332 322 £2824.23 -2 2022-01-06 2022-01-01 14:00:00
3 4 Sunday 46.0 478.0 3423 24463 £4732.34 -3 2022-01-04 2022-01-01 12:00:00
4 5 Thursday NaN 427.0 4322 374 £584.35 4 2022-01-05 2022-01-01 13:00:00
2 3 Tuesday 45.0 NaN 2342 36323 £753.35 3 2022-01-03 2022-01-01 11:00:00
1 2 Wednesday 34.0 433.0 35 24663 £357.35 2 2022-01-02 2022-01-01 10:00:00

Using ascending to sort a column in reverse alphabetical order

If you want to sort the column in reverse alphabetical order, set the ascending parameter to False.

df.sort_values(by='day', ascending=False)
page day users sessions pageviews events revenue score dates datetimes
1 2 Wednesday 34.0 433.0 35 24663 £357.35 2 2022-01-02 2022-01-01 10:00:00
2 3 Tuesday 45.0 NaN 2342 36323 £753.35 3 2022-01-03 2022-01-01 11:00:00
4 5 Thursday NaN 427.0 4322 374 £584.35 4 2022-01-05 2022-01-01 13:00:00
3 4 Sunday 46.0 478.0 3423 24463 £4732.34 -3 2022-01-04 2022-01-01 12:00:00
5 6 Saturday 35.0 NaN 332 322 £2824.23 -2 2022-01-06 2022-01-01 14:00:00
0 1 Monday 34.0 493.0 837 2347 £342.34 1 2022-01-01 2022-01-01 09:00:00
6 7 Friday 42.0 233.0 300 2563 £1008.98 -1 2022-01-07 2022-01-01 15:00:00

Using ascending to sort multiple columns in their own orders

If you want to sort multiple columns in their own orders, set the ascending parameter to a list of True and False values.

df.sort_values(by=['pageviews', 'events'], 
               ascending=[False, True])
page day users sessions pageviews events revenue score dates datetimes
4 5 Thursday NaN 427.0 4322 374 £584.35 4 2022-01-05 2022-01-01 13:00:00
3 4 Sunday 46.0 478.0 3423 24463 £4732.34 -3 2022-01-04 2022-01-01 12:00:00
2 3 Tuesday 45.0 NaN 2342 36323 £753.35 3 2022-01-03 2022-01-01 11:00:00
0 1 Monday 34.0 493.0 837 2347 £342.34 1 2022-01-01 2022-01-01 09:00:00
5 6 Saturday 35.0 NaN 332 322 £2824.23 -2 2022-01-06 2022-01-01 14:00:00
6 7 Friday 42.0 233.0 300 2563 £1008.98 -1 2022-01-07 2022-01-01 15:00:00
1 2 Wednesday 34.0 433.0 35 24663 £357.35 2 2022-01-02 2022-01-01 10:00:00

Using kind to set the sorting algorithm used

By default sort_values() uses the quicksort algorithm. You can change this by setting the kind parameter to mergesort or heapsort. The Numpy documentation has more information on the differences between the algorithms.

df.sort_values(by='sessions', kind='quicksort')
page day users sessions pageviews events revenue score dates datetimes
6 7 Friday 42.0 233.0 300 2563 £1008.98 -1 2022-01-07 2022-01-01 15:00:00
4 5 Thursday NaN 427.0 4322 374 £584.35 4 2022-01-05 2022-01-01 13:00:00
1 2 Wednesday 34.0 433.0 35 24663 £357.35 2 2022-01-02 2022-01-01 10:00:00
3 4 Sunday 46.0 478.0 3423 24463 £4732.34 -3 2022-01-04 2022-01-01 12:00:00
0 1 Monday 34.0 493.0 837 2347 £342.34 1 2022-01-01 2022-01-01 09:00:00
2 3 Tuesday 45.0 NaN 2342 36323 £753.35 3 2022-01-03 2022-01-01 11:00:00
5 6 Saturday 35.0 NaN 332 322 £2824.23 -2 2022-01-06 2022-01-01 14:00:00
df.sort_values(by='sessions', kind='mergesort')
page day users sessions pageviews events revenue score dates datetimes
6 7 Friday 42.0 233.0 300 2563 £1008.98 -1 2022-01-07 2022-01-01 15:00:00
4 5 Thursday NaN 427.0 4322 374 £584.35 4 2022-01-05 2022-01-01 13:00:00
1 2 Wednesday 34.0 433.0 35 24663 £357.35 2 2022-01-02 2022-01-01 10:00:00
3 4 Sunday 46.0 478.0 3423 24463 £4732.34 -3 2022-01-04 2022-01-01 12:00:00
0 1 Monday 34.0 493.0 837 2347 £342.34 1 2022-01-01 2022-01-01 09:00:00
2 3 Tuesday 45.0 NaN 2342 36323 £753.35 3 2022-01-03 2022-01-01 11:00:00
5 6 Saturday 35.0 NaN 332 322 £2824.23 -2 2022-01-06 2022-01-01 14:00:00
df.sort_values(by='sessions', kind='heapsort')
page day users sessions pageviews events revenue score dates datetimes
6 7 Friday 42.0 233.0 300 2563 £1008.98 -1 2022-01-07 2022-01-01 15:00:00
4 5 Thursday NaN 427.0 4322 374 £584.35 4 2022-01-05 2022-01-01 13:00:00
1 2 Wednesday 34.0 433.0 35 24663 £357.35 2 2022-01-02 2022-01-01 10:00:00
3 4 Sunday 46.0 478.0 3423 24463 £4732.34 -3 2022-01-04 2022-01-01 12:00:00
0 1 Monday 34.0 493.0 837 2347 £342.34 1 2022-01-01 2022-01-01 09:00:00
2 3 Tuesday 45.0 NaN 2342 36323 £753.35 3 2022-01-03 2022-01-01 11:00:00
5 6 Saturday 35.0 NaN 332 322 £2824.23 -2 2022-01-06 2022-01-01 14:00:00
df.sort_values(by='sessions', kind='stable')
page day users sessions pageviews events revenue score dates datetimes
6 7 Friday 42.0 233.0 300 2563 £1008.98 -1 2022-01-07 2022-01-01 15:00:00
4 5 Thursday NaN 427.0 4322 374 £584.35 4 2022-01-05 2022-01-01 13:00:00
1 2 Wednesday 34.0 433.0 35 24663 £357.35 2 2022-01-02 2022-01-01 10:00:00
3 4 Sunday 46.0 478.0 3423 24463 £4732.34 -3 2022-01-04 2022-01-01 12:00:00
0 1 Monday 34.0 493.0 837 2347 £342.34 1 2022-01-01 2022-01-01 09:00:00
2 3 Tuesday 45.0 NaN 2342 36323 £753.35 3 2022-01-03 2022-01-01 11:00:00
5 6 Saturday 35.0 NaN 332 322 £2824.23 -2 2022-01-06 2022-01-01 14:00:00

Using inplace to save the sort order to the source dataframe

If you want to save the sort order to the source dataframe, you can use the inplace parameter. This will modify the source dataframe and return None. It’s the same as using df = df.sort_values(by='sessions') but it’s more efficient.

df
page day users sessions pageviews events revenue score dates datetimes
0 1 Monday 34.0 493.0 837 2347 £342.34 1 2022-01-01 2022-01-01 09:00:00
1 2 Wednesday 34.0 433.0 35 24663 £357.35 2 2022-01-02 2022-01-01 10:00:00
2 3 Tuesday 45.0 NaN 2342 36323 £753.35 3 2022-01-03 2022-01-01 11:00:00
3 4 Sunday 46.0 478.0 3423 24463 £4732.34 -3 2022-01-04 2022-01-01 12:00:00
4 5 Thursday NaN 427.0 4322 374 £584.35 4 2022-01-05 2022-01-01 13:00:00
5 6 Saturday 35.0 NaN 332 322 £2824.23 -2 2022-01-06 2022-01-01 14:00:00
6 7 Friday 42.0 233.0 300 2563 £1008.98 -1 2022-01-07 2022-01-01 15:00:00
df.sort_values(by='sessions', inplace=True)
df
page day users sessions pageviews events revenue score dates datetimes
6 7 Friday 42.0 233.0 300 2563 £1008.98 -1 2022-01-07 2022-01-01 15:00:00
4 5 Thursday NaN 427.0 4322 374 £584.35 4 2022-01-05 2022-01-01 13:00:00
1 2 Wednesday 34.0 433.0 35 24663 £357.35 2 2022-01-02 2022-01-01 10:00:00
3 4 Sunday 46.0 478.0 3423 24463 £4732.34 -3 2022-01-04 2022-01-01 12:00:00
0 1 Monday 34.0 493.0 837 2347 £342.34 1 2022-01-01 2022-01-01 09:00:00
2 3 Tuesday 45.0 NaN 2342 36323 £753.35 3 2022-01-03 2022-01-01 11:00:00
5 6 Saturday 35.0 NaN 332 322 £2824.23 -2 2022-01-06 2022-01-01 14:00:00

Using na_position to set how NaN values are sorted

If you’re dealing with a dataset that contains missing values or NaNs, you can use the na_position parameter to set how these values are sorted. By default, NaN values are sorted to the end of the DataFrame.

df.sort_values(by='sessions')
page day users sessions pageviews events revenue score dates datetimes
6 7 Friday 42.0 233.0 300 2563 £1008.98 -1 2022-01-07 2022-01-01 15:00:00
4 5 Thursday NaN 427.0 4322 374 £584.35 4 2022-01-05 2022-01-01 13:00:00
1 2 Wednesday 34.0 433.0 35 24663 £357.35 2 2022-01-02 2022-01-01 10:00:00
3 4 Sunday 46.0 478.0 3423 24463 £4732.34 -3 2022-01-04 2022-01-01 12:00:00
0 1 Monday 34.0 493.0 837 2347 £342.34 1 2022-01-01 2022-01-01 09:00:00
2 3 Tuesday 45.0 NaN 2342 36323 £753.35 3 2022-01-03 2022-01-01 11:00:00
5 6 Saturday 35.0 NaN 332 322 £2824.23 -2 2022-01-06 2022-01-01 14:00:00

The above code should give you the same results as manually setting na_position='last'.

df.sort_values(by='sessions', na_position='last')
page day users sessions pageviews events revenue score dates datetimes
6 7 Friday 42.0 233.0 300 2563 £1008.98 -1 2022-01-07 2022-01-01 15:00:00
4 5 Thursday NaN 427.0 4322 374 £584.35 4 2022-01-05 2022-01-01 13:00:00
1 2 Wednesday 34.0 433.0 35 24663 £357.35 2 2022-01-02 2022-01-01 10:00:00
3 4 Sunday 46.0 478.0 3423 24463 £4732.34 -3 2022-01-04 2022-01-01 12:00:00
0 1 Monday 34.0 493.0 837 2347 £342.34 1 2022-01-01 2022-01-01 09:00:00
2 3 Tuesday 45.0 NaN 2342 36323 £753.35 3 2022-01-03 2022-01-01 11:00:00
5 6 Saturday 35.0 NaN 332 322 £2824.23 -2 2022-01-06 2022-01-01 14:00:00

If you want to bring the NaNs to the top, you can use na_position='first'.

df.sort_values(by='sessions', na_position='first')
page day users sessions pageviews events revenue score dates datetimes
2 3 Tuesday 45.0 NaN 2342 36323 £753.35 3 2022-01-03 2022-01-01 11:00:00
5 6 Saturday 35.0 NaN 332 322 £2824.23 -2 2022-01-06 2022-01-01 14:00:00
6 7 Friday 42.0 233.0 300 2563 £1008.98 -1 2022-01-07 2022-01-01 15:00:00
4 5 Thursday NaN 427.0 4322 374 £584.35 4 2022-01-05 2022-01-01 13:00:00
1 2 Wednesday 34.0 433.0 35 24663 £357.35 2 2022-01-02 2022-01-01 10:00:00
3 4 Sunday 46.0 478.0 3423 24463 £4732.34 -3 2022-01-04 2022-01-01 12:00:00
0 1 Monday 34.0 493.0 837 2347 £342.34 1 2022-01-01 2022-01-01 09:00:00

Using ignore_index to create a new index based on the sort order

The ignore_index parameter can be used to create a new index based on the sort order. This is useful if you want to reset the index after sorting. The default value is False and will leave the original index intact.


```python
df.sort_values(by='sessions', ignore_index=False)
page day users sessions pageviews events revenue score dates datetimes
6 7 Friday 42.0 233.0 300 2563 £1008.98 -1 2022-01-07 2022-01-01 15:00:00
4 5 Thursday NaN 427.0 4322 374 £584.35 4 2022-01-05 2022-01-01 13:00:00
1 2 Wednesday 34.0 433.0 35 24663 £357.35 2 2022-01-02 2022-01-01 10:00:00
3 4 Sunday 46.0 478.0 3423 24463 £4732.34 -3 2022-01-04 2022-01-01 12:00:00
0 1 Monday 34.0 493.0 837 2347 £342.34 1 2022-01-01 2022-01-01 09:00:00
2 3 Tuesday 45.0 NaN 2342 36323 £753.35 3 2022-01-03 2022-01-01 11:00:00
5 6 Saturday 35.0 NaN 332 322 £2824.23 -2 2022-01-06 2022-01-01 14:00:00

If you want to sort the data by a given sort order and then create a new index that reflects the sort order, you can do this by setting ignore_index to True.

df.sort_values(by='sessions', ignore_index=True)
page day users sessions pageviews events revenue score dates datetimes
0 7 Friday 42.0 233.0 300 2563 £1008.98 -1 2022-01-07 2022-01-01 15:00:00
1 5 Thursday NaN 427.0 4322 374 £584.35 4 2022-01-05 2022-01-01 13:00:00
2 2 Wednesday 34.0 433.0 35 24663 £357.35 2 2022-01-02 2022-01-01 10:00:00
3 4 Sunday 46.0 478.0 3423 24463 £4732.34 -3 2022-01-04 2022-01-01 12:00:00
4 1 Monday 34.0 493.0 837 2347 £342.34 1 2022-01-01 2022-01-01 09:00:00
5 3 Tuesday 45.0 NaN 2342 36323 £753.35 3 2022-01-03 2022-01-01 11:00:00
6 6 Saturday 35.0 NaN 332 322 £2824.23 -2 2022-01-06 2022-01-01 14:00:00

Using key to sort after applying a lambda function

The new key argument added to the sort_values() function in Pandas 1.1 allows you to apply a lambda function to the column before sorting the data. This can be very useful when you want to sort object data that represent currency or monetary values, where the presence of a pound sign or dollar sign aids formatting but turns a float into an object, which can’t be sorted as effectively. For example, let’s see what happens when we attempt to sort the Revenue column in descending order. The values, as you can see below, don’t get sorted as you would expect.

df.sort_values(by='revenue', ascending=False)
page day users sessions pageviews events revenue score dates datetimes
2 3 Tuesday 45.0 NaN 2342 36323 £753.35 3 2022-01-03 2022-01-01 11:00:00
4 5 Thursday NaN 427.0 4322 374 £584.35 4 2022-01-05 2022-01-01 13:00:00
3 4 Sunday 46.0 478.0 3423 24463 £4732.34 -3 2022-01-04 2022-01-01 12:00:00
1 2 Wednesday 34.0 433.0 35 24663 £357.35 2 2022-01-02 2022-01-01 10:00:00
0 1 Monday 34.0 493.0 837 2347 £342.34 1 2022-01-01 2022-01-01 09:00:00
5 6 Saturday 35.0 NaN 332 322 £2824.23 -2 2022-01-06 2022-01-01 14:00:00
6 7 Friday 42.0 233.0 300 2563 £1008.98 -1 2022-01-07 2022-01-01 15:00:00

By using the key parameter we can define a lambda function that gets applied to the column before the data gets sorted, but which retains the original formatting. The example below uses the replace() function to replace the pound sign with a blank space, then uses astype() to cast the object dtype to a float dtype, allowing it to be sorted correctly. The sort_values() returns the sorted values, but leaves the pound sign in the data.

df.sort_values(by='revenue', 
               ascending=False, 
               key=lambda val: val.str.replace('£', '').astype(float)
              )
page day users sessions pageviews events revenue score dates datetimes
3 4 Sunday 46.0 478.0 3423 24463 £4732.34 -3 2022-01-04 2022-01-01 12:00:00
5 6 Saturday 35.0 NaN 332 322 £2824.23 -2 2022-01-06 2022-01-01 14:00:00
6 7 Friday 42.0 233.0 300 2563 £1008.98 -1 2022-01-07 2022-01-01 15:00:00
2 3 Tuesday 45.0 NaN 2342 36323 £753.35 3 2022-01-03 2022-01-01 11:00:00
4 5 Thursday NaN 427.0 4322 374 £584.35 4 2022-01-05 2022-01-01 13:00:00
1 2 Wednesday 34.0 433.0 35 24663 £357.35 2 2022-01-02 2022-01-01 10:00:00
0 1 Monday 34.0 493.0 837 2347 £342.34 1 2022-01-01 2022-01-01 09:00:00

Using sort_index() to sort a dataframe by index number

Sometimes, you might want to sort the data in a dataframe based on the index itself. You can do this using the sort_index() method. The sort_index() method has the same parameters as sort_values(), but it only has one parameter, axis, which is set to 0 by default. The axis parameter is used to specify whether you want to sort the index or the columns. The axis parameter can be set to 0 to sort the index, or 1 to sort the columns.

df.sort_index(ascending=True)
page day users sessions pageviews events revenue score dates datetimes
0 1 Monday 34.0 493.0 837 2347 £342.34 1 2022-01-01 2022-01-01 09:00:00
1 2 Wednesday 34.0 433.0 35 24663 £357.35 2 2022-01-02 2022-01-01 10:00:00
2 3 Tuesday 45.0 NaN 2342 36323 £753.35 3 2022-01-03 2022-01-01 11:00:00
3 4 Sunday 46.0 478.0 3423 24463 £4732.34 -3 2022-01-04 2022-01-01 12:00:00
4 5 Thursday NaN 427.0 4322 374 £584.35 4 2022-01-05 2022-01-01 13:00:00
5 6 Saturday 35.0 NaN 332 322 £2824.23 -2 2022-01-06 2022-01-01 14:00:00
6 7 Friday 42.0 233.0 300 2563 £1008.98 -1 2022-01-07 2022-01-01 15:00:00

Using sort_index() to sort a dataframe by column name

You can also use the sort_index() function to sort a Pandas dataframe by column name. By default, sort_index() sets ascending to True, so columns will appear in alphabetical order, however, you can easily change this by using df.sort_index(axis=1, ascending=False) instead of df.sort_index(axis=1).

df.sort_index(axis=1)
dates datetimes day events page pageviews revenue score sessions users
6 2022-01-07 2022-01-01 15:00:00 Friday 2563 7 300 £1008.98 -1 233.0 42.0
4 2022-01-05 2022-01-01 13:00:00 Thursday 374 5 4322 £584.35 4 427.0 NaN
1 2022-01-02 2022-01-01 10:00:00 Wednesday 24663 2 35 £357.35 2 433.0 34.0
3 2022-01-04 2022-01-01 12:00:00 Sunday 24463 4 3423 £4732.34 -3 478.0 46.0
0 2022-01-01 2022-01-01 09:00:00 Monday 2347 1 837 £342.34 1 493.0 34.0
2 2022-01-03 2022-01-01 11:00:00 Tuesday 36323 3 2342 £753.35 3 NaN 45.0
5 2022-01-06 2022-01-01 14:00:00 Saturday 322 6 332 £2824.23 -2 NaN 35.0

Using key and abs to sort a dataframe by absolute value

By default, when you sort a numeric column it will appear in ascending order, so if you’ve got a mixture of positive and negative values, the negative values will appear at the top of the column. You can flip the orientation so that the values appear in descending order by using df.sort_values(by='column_name', ascending=False).

df.sort_values(by='score')
page day users sessions pageviews events revenue score dates datetimes
3 4 Sunday 46.0 478.0 3423 24463 £4732.34 -3 2022-01-04 2022-01-01 12:00:00
5 6 Saturday 35.0 NaN 332 322 £2824.23 -2 2022-01-06 2022-01-01 14:00:00
6 7 Friday 42.0 233.0 300 2563 £1008.98 -1 2022-01-07 2022-01-01 15:00:00
0 1 Monday 34.0 493.0 837 2347 £342.34 1 2022-01-01 2022-01-01 09:00:00
1 2 Wednesday 34.0 433.0 35 24663 £357.35 2 2022-01-02 2022-01-01 10:00:00
2 3 Tuesday 45.0 NaN 2342 36323 £753.35 3 2022-01-03 2022-01-01 11:00:00
4 5 Thursday NaN 427.0 4322 374 £584.35 4 2022-01-05 2022-01-01 13:00:00

In rare cases, you might want to consider negative and positive values equally via their absolute value. You can also do this using the key argument and by passing the Pandas abs function to it.

df.sort_values(by='score', key=pd.Series.abs)
page day users sessions pageviews events revenue score dates datetimes
6 7 Friday 42.0 233.0 300 2563 £1008.98 -1 2022-01-07 2022-01-01 15:00:00
0 1 Monday 34.0 493.0 837 2347 £342.34 1 2022-01-01 2022-01-01 09:00:00
1 2 Wednesday 34.0 433.0 35 24663 £357.35 2 2022-01-02 2022-01-01 10:00:00
5 6 Saturday 35.0 NaN 332 322 £2824.23 -2 2022-01-06 2022-01-01 14:00:00
3 4 Sunday 46.0 478.0 3423 24463 £4732.34 -3 2022-01-04 2022-01-01 12:00:00
2 3 Tuesday 45.0 NaN 2342 36323 £753.35 3 2022-01-03 2022-01-01 11:00:00
4 5 Thursday NaN 427.0 4322 374 £584.35 4 2022-01-05 2022-01-01 13:00:00

Using sort_values() to sort by date

Dates are also handled by sort_values() by default. You can sort by date by passing the column name to the by argument. The default sorting order is ascending, so the earliest date will be at the top of the dataframe. You can change this by passing ascending=False to the sort_values() function.

df.sort_values(by='dates', ascending=False)
page day users sessions pageviews events revenue score dates datetimes
6 7 Friday 42.0 233.0 300 2563 £1008.98 -1 2022-01-07 2022-01-01 15:00:00
5 6 Saturday 35.0 NaN 332 322 £2824.23 -2 2022-01-06 2022-01-01 14:00:00
4 5 Thursday NaN 427.0 4322 374 £584.35 4 2022-01-05 2022-01-01 13:00:00
3 4 Sunday 46.0 478.0 3423 24463 £4732.34 -3 2022-01-04 2022-01-01 12:00:00
2 3 Tuesday 45.0 NaN 2342 36323 £753.35 3 2022-01-03 2022-01-01 11:00:00
1 2 Wednesday 34.0 433.0 35 24663 £357.35 2 2022-01-02 2022-01-01 10:00:00
0 1 Monday 34.0 493.0 837 2347 £342.34 1 2022-01-01 2022-01-01 09:00:00

Using sort_values() to sort by datetime

You can also sort by datetime by passing the column name to the by argument. The default sorting order is ascending, so the earliest datetime will be at the top of the dataframe. You can change this by passing ascending=False to the sort_values() function.

df.sort_values(by='datetimes', ascending=True)
page day users sessions pageviews events revenue score dates datetimes
0 1 Monday 34.0 493.0 837 2347 £342.34 1 2022-01-01 2022-01-01 09:00:00
1 2 Wednesday 34.0 433.0 35 24663 £357.35 2 2022-01-02 2022-01-01 10:00:00
2 3 Tuesday 45.0 NaN 2342 36323 £753.35 3 2022-01-03 2022-01-01 11:00:00
3 4 Sunday 46.0 478.0 3423 24463 £4732.34 -3 2022-01-04 2022-01-01 12:00:00
4 5 Thursday NaN 427.0 4322 374 £584.35 4 2022-01-05 2022-01-01 13:00:00
5 6 Saturday 35.0 NaN 332 322 £2824.23 -2 2022-01-06 2022-01-01 14:00:00
6 7 Friday 42.0 233.0 300 2563 £1008.98 -1 2022-01-07 2022-01-01 15:00:00

Using a CategoricalDtype to create a custom sort order

Finally, you can even create custom sort orders of your own using the Pandas CategoricalDtype class. While the ascending parameter allows you to put categorical variables or object dtypes into alphabetical order, sometimes this won’t make sense. For example, as we saw above, weekdays in the Day column of our test dataset are placed in alphabetical order, but this doesn’t represent the order in which they should be sorted, i.e. Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, Sunday. Similarly, month names and product sizes, such as S, XS, M, L, XL, XXL etc, require special treatment via the creation of a custom sort order.

A really neat way to do this is via the Pandas CategoricalDtype class of pandas.api.types. To use this, you’ll first need to load the class, then create a list of values to which you want to set the CategoricalDtype, and finally pass in the ordered=True argument which defines the custom sort order. Once that’s done, you can use the astype() function to cast the object data to a new category dtype, then you can pass the column name to by and sort the data using your custom sort order.

from pandas.api.types import CategoricalDtype
weekday = CategoricalDtype(['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday'], ordered=True)
df['day'] = df['day'].astype(weekday)
df.sort_values(by='day')
page day users sessions pageviews events revenue score dates datetimes
0 1 Monday 34.0 493.0 837 2347 £342.34 1 2022-01-01 2022-01-01 09:00:00
2 3 Tuesday 45.0 NaN 2342 36323 £753.35 3 2022-01-03 2022-01-01 11:00:00
1 2 Wednesday 34.0 433.0 35 24663 £357.35 2 2022-01-02 2022-01-01 10:00:00
4 5 Thursday NaN 427.0 4322 374 £584.35 4 2022-01-05 2022-01-01 13:00:00
6 7 Friday 42.0 233.0 300 2563 £1008.98 -1 2022-01-07 2022-01-01 15:00:00
5 6 Saturday 35.0 NaN 332 322 £2824.23 -2 2022-01-06 2022-01-01 14:00:00
3 4 Sunday 46.0 478.0 3423 24463 £4732.34 -3 2022-01-04 2022-01-01 12:00:00
df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 7 entries, 6 to 5
Data columns (total 10 columns):
 #   Column     Non-Null Count  Dtype         
---  ------     --------------  -----         
 0   page       7 non-null      int64         
 1   day        7 non-null      category      
 2   users      6 non-null      float64       
 3   sessions   5 non-null      float64       
 4   pageviews  7 non-null      int64         
 5   events     7 non-null      int64         
 6   revenue    7 non-null      object        
 7   score      7 non-null      int64         
 8   dates      7 non-null      datetime64[ns]
 9   datetimes  7 non-null      datetime64[ns]
dtypes: category(1), datetime64[ns](2), float64(2), int64(4), object(1)
memory usage: 923.0+ bytes

Matt Clarke, Sunday, November 06, 2022

Matt Clarke Matt is an Ecommerce and Marketing Director who uses data science to help in his work. Matt has a Master's degree in Internet Retailing (plus two other Master's degrees in different fields) and specialises in the technical side of ecommerce and marketing.