How to use sort_values() to sort a Pandas DataFrame

Picture by Pramod Tiwari, Pexels.

48 minutes to read

Data Science Pandas

When working with Pandas dataframes you’ll commonly need to sort the data in some way. This is easy to do with the sort_values() and sort_index() methods. These two methods allow you to sort in ascending or descending order, alphabetical or reverse alphabetical order, and using times, datetimes, or even custom sort orders of your own creation. In this post, we’ll go through the various features of the sort_values() function, so you can use it to sort your data in the way you want.

The sort_values() method

The sort_values() is the most commonly used way to sort data in Pandas dataframes. At its most basic, it can be used to control when data are ordered in ascending or descending order. For example, if you wanted to sort the sales column in ascending order, however, the function also includes some more advanced parameters that allow you to do some much more sophisticated sorting. Here’s a summary of the parameters you can use with the sort_values() method and what they do. We’ll work through each one in the next steps.

Parameter	Description
by	The `by` parameter takes a string or list of strings that denote the columns to sort the dataframe by. For example, `df.sort_values(by='PageViews')` will sort a dataframe by the `PageViews` column, while `df.sort_values(by=['PageViews', 'Events'])` will sort by both `PageViews` and `Events`.
ascending	The `ascending` parameter is optional and defaults to `True` so when you don't pass the parameter the dataframe will be sorted in ascending order of the defined column. To sort the dataframe values in descending order you need to pass the argument `ascending=False`.
kind	The optional `kind` parameter defines the sorting algorithm Pandas uses to sort the column. By default, this is set to `quicksort`, but you can also set `kind` to `mergesort`, `heapsort`, or `stable` if you prefer. The `numpy.sort()` documentation explains how these sorting algorithms work.
inplace	The optional `inplace` parameter is set to `False` by default and allows you to sort a dataframe without saving the changes back to the original. If you set this to `inplace=True`, the sort order you set will automatically be saved back to the source dataframe without you needing to redeclare it back to the dataframe's variable name.
na_position	The optional `na_position` parameter is set to `last` by default and puts NaN values at the bottom of the sort order. If you want NaN values to appear at the top of the sort you can specify `na_position='first'`.
ignore_index	The optional `ignore_index` parameter is set to `False` by default. When you sort the dataframe this will preserve the original index numbers. If you want to renumber your index so the values represent the new sort order you can pass in the argument `ignore_index=True`.
key	The optional `key` parameter lets you apply a `lambda` function to a column before you sort it. It's really useful for sorting monetary values, since you can remove currency formatting and cast the `dtype` to a numeric value to obtain an accurate sort.

Create a dataframe

To get started, open a Jupyter notebook and import the Pandas and Numpy packages, then either load up a dataframe of your own data, or create a dataframe containing different Pandas dtypes and including some missing values. These will let us work through the various methods of sorting in Pandas to see how they handle data of different kinds.

import pandas as pd
import numpy as np

df = pd.DataFrame.from_dict({
     "page": [1, 2, 3, 4, 5, 6, 7],
     "day": ['Monday', 'Wednesday', 'Tuesday', 'Sunday', 'Thursday', 'Saturday', 'Friday'],
     "users": [34, 34, 45, 46, np.nan, 35, 42],
     "sessions": [493, 433, np.nan, 478, 427, np.nan, 233],
     "pageviews": [837, 35, 2342, 3423, 4322, 332, 300],
     "events": [2347, 24663, 36323, 24463, 374, 322, 2563],
     "revenue": ['£342.34', '£357.35', '£753.35', '£4732.34', '£584.35', '£2824.23', '£1008.98'], 
     "score": [1, 2, 3, -3, 4, -2, -1],
     "dates": pd.date_range('2022-01-01', periods=7, freq='D'), 
     "datetimes": pd.date_range('2022-01-01 09:00', periods=7, freq='H'),
    })
df

	page	day	users	sessions	pageviews	events	revenue	score	dates	datetimes
0	1	Monday	34.0	493.0	837	2347	£342.34	1	2022-01-01	2022-01-01 09:00:00
1	2	Wednesday	34.0	433.0	35	24663	£357.35	2	2022-01-02	2022-01-01 10:00:00
2	3	Tuesday	45.0	NaN	2342	36323	£753.35	3	2022-01-03	2022-01-01 11:00:00
3	4	Sunday	46.0	478.0	3423	24463	£4732.34	-3	2022-01-04	2022-01-01 12:00:00
4	5	Thursday	NaN	427.0	4322	374	£584.35	4	2022-01-05	2022-01-01 13:00:00
5	6	Saturday	35.0	NaN	332	322	£2824.23	-2	2022-01-06	2022-01-01 14:00:00
6	7	Friday	42.0	233.0	300	2563	£1008.98	-1	2022-01-07	2022-01-01 15:00:00

Using by to sort a dataframe by a single column

The most commonly used way of sorting a dataframe is to sort by a single column using the by parameter. This will sort the dataframe in ascending order by default. That means numeric data will be sorted from smallest to largest and string data will be sorted alphabetically. If you want to sort in descending order, you can use the ascending parameter and set it to False.

df.sort_values(by='pageviews')

	page	day	users	sessions	pageviews	events	revenue	score	dates	datetimes
1	2	Wednesday	34.0	433.0	35	24663	£357.35	2	2022-01-02	2022-01-01 10:00:00
6	7	Friday	42.0	233.0	300	2563	£1008.98	-1	2022-01-07	2022-01-01 15:00:00
5	6	Saturday	35.0	NaN	332	322	£2824.23	-2	2022-01-06	2022-01-01 14:00:00
0	1	Monday	34.0	493.0	837	2347	£342.34	1	2022-01-01	2022-01-01 09:00:00
2	3	Tuesday	45.0	NaN	2342	36323	£753.35	3	2022-01-03	2022-01-01 11:00:00
3	4	Sunday	46.0	478.0	3423	24463	£4732.34	-3	2022-01-04	2022-01-01 12:00:00
4	5	Thursday	NaN	427.0	4322	374	£584.35	4	2022-01-05	2022-01-01 13:00:00

Using by to sort a dataframe by multiple columns

You can also sort a Pandas dataframe by multiple columns. For example, you might want to order by pageviews first and then by events. To do this you simply pass a list of column names to the by parameter. By default, Pandas will sort in ascending order. If you want to sort in descending order, you can use the ascending parameter and set it to False` for each value.

df.sort_values(by=['pageviews', 'events'])

	page	day	users	sessions	pageviews	events	revenue	score	dates	datetimes
1	2	Wednesday	34.0	433.0	35	24663	£357.35	2	2022-01-02	2022-01-01 10:00:00
6	7	Friday	42.0	233.0	300	2563	£1008.98	-1	2022-01-07	2022-01-01 15:00:00
5	6	Saturday	35.0	NaN	332	322	£2824.23	-2	2022-01-06	2022-01-01 14:00:00
0	1	Monday	34.0	493.0	837	2347	£342.34	1	2022-01-01	2022-01-01 09:00:00
2	3	Tuesday	45.0	NaN	2342	36323	£753.35	3	2022-01-03	2022-01-01 11:00:00
3	4	Sunday	46.0	478.0	3423	24463	£4732.34	-3	2022-01-04	2022-01-01 12:00:00
4	5	Thursday	NaN	427.0	4322	374	£584.35	4	2022-01-05	2022-01-01 13:00:00

Using ascending to sort a column in ascending order

As we saw above, by default Pandas will sort in ascending order. You don’t need to specify the ascending parameter if you want to sort in ascending order, but it can make your code easier for others to read.

df.sort_values(by='pageviews', ascending=True)

	page	day	users	sessions	pageviews	events	revenue	score	dates	datetimes
1	2	Wednesday	34.0	433.0	35	24663	£357.35	2	2022-01-02	2022-01-01 10:00:00
6	7	Friday	42.0	233.0	300	2563	£1008.98	-1	2022-01-07	2022-01-01 15:00:00
5	6	Saturday	35.0	NaN	332	322	£2824.23	-2	2022-01-06	2022-01-01 14:00:00
0	1	Monday	34.0	493.0	837	2347	£342.34	1	2022-01-01	2022-01-01 09:00:00
2	3	Tuesday	45.0	NaN	2342	36323	£753.35	3	2022-01-03	2022-01-01 11:00:00
3	4	Sunday	46.0	478.0	3423	24463	£4732.34	-3	2022-01-04	2022-01-01 12:00:00
4	5	Thursday	NaN	427.0	4322	374	£584.35	4	2022-01-05	2022-01-01 13:00:00

Using ascending to sort a column in descending order

To sort a Pandas dataframe column in descending order, you can set the ascending parameter to False.

df.sort_values(by='pageviews', ascending=False)

	page	day	users	sessions	pageviews	events	revenue	score	dates	datetimes
4	5	Thursday	NaN	427.0	4322	374	£584.35	4	2022-01-05	2022-01-01 13:00:00
3	4	Sunday	46.0	478.0	3423	24463	£4732.34	-3	2022-01-04	2022-01-01 12:00:00
2	3	Tuesday	45.0	NaN	2342	36323	£753.35	3	2022-01-03	2022-01-01 11:00:00
0	1	Monday	34.0	493.0	837	2347	£342.34	1	2022-01-01	2022-01-01 09:00:00
5	6	Saturday	35.0	NaN	332	322	£2824.23	-2	2022-01-06	2022-01-01 14:00:00
6	7	Friday	42.0	233.0	300	2563	£1008.98	-1	2022-01-07	2022-01-01 15:00:00
1	2	Wednesday	34.0	433.0	35	24663	£357.35	2	2022-01-02	2022-01-01 10:00:00

Using ascending to sort a column in alphabetical order

If the column contains non-numeric or categorical data with an object or categorical data type, you can sort the column in alphabetical order by setting the ascending parameter to True.

df.sort_values(by='day', ascending=True)

	page	day	users	sessions	pageviews	events	revenue	score	dates	datetimes
6	7	Friday	42.0	233.0	300	2563	£1008.98	-1	2022-01-07	2022-01-01 15:00:00
0	1	Monday	34.0	493.0	837	2347	£342.34	1	2022-01-01	2022-01-01 09:00:00
5	6	Saturday	35.0	NaN	332	322	£2824.23	-2	2022-01-06	2022-01-01 14:00:00
3	4	Sunday	46.0	478.0	3423	24463	£4732.34	-3	2022-01-04	2022-01-01 12:00:00
4	5	Thursday	NaN	427.0	4322	374	£584.35	4	2022-01-05	2022-01-01 13:00:00
2	3	Tuesday	45.0	NaN	2342	36323	£753.35	3	2022-01-03	2022-01-01 11:00:00
1	2	Wednesday	34.0	433.0	35	24663	£357.35	2	2022-01-02	2022-01-01 10:00:00

Using ascending to sort a column in reverse alphabetical order

If you want to sort the column in reverse alphabetical order, set the ascending parameter to False.

df.sort_values(by='day', ascending=False)

	page	day	users	sessions	pageviews	events	revenue	score	dates	datetimes
1	2	Wednesday	34.0	433.0	35	24663	£357.35	2	2022-01-02	2022-01-01 10:00:00
2	3	Tuesday	45.0	NaN	2342	36323	£753.35	3	2022-01-03	2022-01-01 11:00:00
4	5	Thursday	NaN	427.0	4322	374	£584.35	4	2022-01-05	2022-01-01 13:00:00
3	4	Sunday	46.0	478.0	3423	24463	£4732.34	-3	2022-01-04	2022-01-01 12:00:00
5	6	Saturday	35.0	NaN	332	322	£2824.23	-2	2022-01-06	2022-01-01 14:00:00
0	1	Monday	34.0	493.0	837	2347	£342.34	1	2022-01-01	2022-01-01 09:00:00
6	7	Friday	42.0	233.0	300	2563	£1008.98	-1	2022-01-07	2022-01-01 15:00:00

Using ascending to sort multiple columns in their own orders

If you want to sort multiple columns in their own orders, set the ascending parameter to a list of True and False values.

df.sort_values(by=['pageviews', 'events'], 
               ascending=[False, True])

	page	day	users	sessions	pageviews	events	revenue	score	dates	datetimes
4	5	Thursday	NaN	427.0	4322	374	£584.35	4	2022-01-05	2022-01-01 13:00:00
3	4	Sunday	46.0	478.0	3423	24463	£4732.34	-3	2022-01-04	2022-01-01 12:00:00
2	3	Tuesday	45.0	NaN	2342	36323	£753.35	3	2022-01-03	2022-01-01 11:00:00
0	1	Monday	34.0	493.0	837	2347	£342.34	1	2022-01-01	2022-01-01 09:00:00
5	6	Saturday	35.0	NaN	332	322	£2824.23	-2	2022-01-06	2022-01-01 14:00:00
6	7	Friday	42.0	233.0	300	2563	£1008.98	-1	2022-01-07	2022-01-01 15:00:00
1	2	Wednesday	34.0	433.0	35	24663	£357.35	2	2022-01-02	2022-01-01 10:00:00

Using kind to set the sorting algorithm used

By default sort_values() uses the quicksort algorithm. You can change this by setting the kind parameter to mergesort or heapsort. The Numpy documentation has more information on the differences between the algorithms.

df.sort_values(by='sessions', kind='quicksort')

	page	day	users	sessions	pageviews	events	revenue	score	dates	datetimes
6	7	Friday	42.0	233.0	300	2563	£1008.98	-1	2022-01-07	2022-01-01 15:00:00
4	5	Thursday	NaN	427.0	4322	374	£584.35	4	2022-01-05	2022-01-01 13:00:00
1	2	Wednesday	34.0	433.0	35	24663	£357.35	2	2022-01-02	2022-01-01 10:00:00
3	4	Sunday	46.0	478.0	3423	24463	£4732.34	-3	2022-01-04	2022-01-01 12:00:00
0	1	Monday	34.0	493.0	837	2347	£342.34	1	2022-01-01	2022-01-01 09:00:00
2	3	Tuesday	45.0	NaN	2342	36323	£753.35	3	2022-01-03	2022-01-01 11:00:00
5	6	Saturday	35.0	NaN	332	322	£2824.23	-2	2022-01-06	2022-01-01 14:00:00

df.sort_values(by='sessions', kind='mergesort')

	page	day	users	sessions	pageviews	events	revenue	score	dates	datetimes
6	7	Friday	42.0	233.0	300	2563	£1008.98	-1	2022-01-07	2022-01-01 15:00:00
4	5	Thursday	NaN	427.0	4322	374	£584.35	4	2022-01-05	2022-01-01 13:00:00
1	2	Wednesday	34.0	433.0	35	24663	£357.35	2	2022-01-02	2022-01-01 10:00:00
3	4	Sunday	46.0	478.0	3423	24463	£4732.34	-3	2022-01-04	2022-01-01 12:00:00
0	1	Monday	34.0	493.0	837	2347	£342.34	1	2022-01-01	2022-01-01 09:00:00
2	3	Tuesday	45.0	NaN	2342	36323	£753.35	3	2022-01-03	2022-01-01 11:00:00
5	6	Saturday	35.0	NaN	332	322	£2824.23	-2	2022-01-06	2022-01-01 14:00:00

df.sort_values(by='sessions', kind='heapsort')

	page	day	users	sessions	pageviews	events	revenue	score	dates	datetimes
6	7	Friday	42.0	233.0	300	2563	£1008.98	-1	2022-01-07	2022-01-01 15:00:00
4	5	Thursday	NaN	427.0	4322	374	£584.35	4	2022-01-05	2022-01-01 13:00:00
1	2	Wednesday	34.0	433.0	35	24663	£357.35	2	2022-01-02	2022-01-01 10:00:00
3	4	Sunday	46.0	478.0	3423	24463	£4732.34	-3	2022-01-04	2022-01-01 12:00:00
0	1	Monday	34.0	493.0	837	2347	£342.34	1	2022-01-01	2022-01-01 09:00:00
2	3	Tuesday	45.0	NaN	2342	36323	£753.35	3	2022-01-03	2022-01-01 11:00:00
5	6	Saturday	35.0	NaN	332	322	£2824.23	-2	2022-01-06	2022-01-01 14:00:00

df.sort_values(by='sessions', kind='stable')

	page	day	users	sessions	pageviews	events	revenue	score	dates	datetimes
6	7	Friday	42.0	233.0	300	2563	£1008.98	-1	2022-01-07	2022-01-01 15:00:00
4	5	Thursday	NaN	427.0	4322	374	£584.35	4	2022-01-05	2022-01-01 13:00:00
1	2	Wednesday	34.0	433.0	35	24663	£357.35	2	2022-01-02	2022-01-01 10:00:00
3	4	Sunday	46.0	478.0	3423	24463	£4732.34	-3	2022-01-04	2022-01-01 12:00:00
0	1	Monday	34.0	493.0	837	2347	£342.34	1	2022-01-01	2022-01-01 09:00:00
2	3	Tuesday	45.0	NaN	2342	36323	£753.35	3	2022-01-03	2022-01-01 11:00:00
5	6	Saturday	35.0	NaN	332	322	£2824.23	-2	2022-01-06	2022-01-01 14:00:00

Using inplace to save the sort order to the source dataframe

If you want to save the sort order to the source dataframe, you can use the inplace parameter. This will modify the source dataframe and return None. It’s the same as using df = df.sort_values(by='sessions') but it’s more efficient.

df

	page	day	users	sessions	pageviews	events	revenue	score	dates	datetimes
0	1	Monday	34.0	493.0	837	2347	£342.34	1	2022-01-01	2022-01-01 09:00:00
1	2	Wednesday	34.0	433.0	35	24663	£357.35	2	2022-01-02	2022-01-01 10:00:00
2	3	Tuesday	45.0	NaN	2342	36323	£753.35	3	2022-01-03	2022-01-01 11:00:00
3	4	Sunday	46.0	478.0	3423	24463	£4732.34	-3	2022-01-04	2022-01-01 12:00:00
4	5	Thursday	NaN	427.0	4322	374	£584.35	4	2022-01-05	2022-01-01 13:00:00
5	6	Saturday	35.0	NaN	332	322	£2824.23	-2	2022-01-06	2022-01-01 14:00:00
6	7	Friday	42.0	233.0	300	2563	£1008.98	-1	2022-01-07	2022-01-01 15:00:00

df.sort_values(by='sessions', inplace=True)

df

	page	day	users	sessions	pageviews	events	revenue	score	dates	datetimes
6	7	Friday	42.0	233.0	300	2563	£1008.98	-1	2022-01-07	2022-01-01 15:00:00
4	5	Thursday	NaN	427.0	4322	374	£584.35	4	2022-01-05	2022-01-01 13:00:00
1	2	Wednesday	34.0	433.0	35	24663	£357.35	2	2022-01-02	2022-01-01 10:00:00
3	4	Sunday	46.0	478.0	3423	24463	£4732.34	-3	2022-01-04	2022-01-01 12:00:00
0	1	Monday	34.0	493.0	837	2347	£342.34	1	2022-01-01	2022-01-01 09:00:00
2	3	Tuesday	45.0	NaN	2342	36323	£753.35	3	2022-01-03	2022-01-01 11:00:00
5	6	Saturday	35.0	NaN	332	322	£2824.23	-2	2022-01-06	2022-01-01 14:00:00

Using na_position to set how NaN values are sorted

If you’re dealing with a dataset that contains missing values or NaNs, you can use the na_position parameter to set how these values are sorted. By default, NaN values are sorted to the end of the DataFrame.

df.sort_values(by='sessions')

	page	day	users	sessions	pageviews	events	revenue	score	dates	datetimes
6	7	Friday	42.0	233.0	300	2563	£1008.98	-1	2022-01-07	2022-01-01 15:00:00
4	5	Thursday	NaN	427.0	4322	374	£584.35	4	2022-01-05	2022-01-01 13:00:00
1	2	Wednesday	34.0	433.0	35	24663	£357.35	2	2022-01-02	2022-01-01 10:00:00
3	4	Sunday	46.0	478.0	3423	24463	£4732.34	-3	2022-01-04	2022-01-01 12:00:00
0	1	Monday	34.0	493.0	837	2347	£342.34	1	2022-01-01	2022-01-01 09:00:00
2	3	Tuesday	45.0	NaN	2342	36323	£753.35	3	2022-01-03	2022-01-01 11:00:00
5	6	Saturday	35.0	NaN	332	322	£2824.23	-2	2022-01-06	2022-01-01 14:00:00

The above code should give you the same results as manually setting na_position='last'.

df.sort_values(by='sessions', na_position='last')

	page	day	users	sessions	pageviews	events	revenue	score	dates	datetimes
6	7	Friday	42.0	233.0	300	2563	£1008.98	-1	2022-01-07	2022-01-01 15:00:00
4	5	Thursday	NaN	427.0	4322	374	£584.35	4	2022-01-05	2022-01-01 13:00:00
1	2	Wednesday	34.0	433.0	35	24663	£357.35	2	2022-01-02	2022-01-01 10:00:00
3	4	Sunday	46.0	478.0	3423	24463	£4732.34	-3	2022-01-04	2022-01-01 12:00:00
0	1	Monday	34.0	493.0	837	2347	£342.34	1	2022-01-01	2022-01-01 09:00:00
2	3	Tuesday	45.0	NaN	2342	36323	£753.35	3	2022-01-03	2022-01-01 11:00:00
5	6	Saturday	35.0	NaN	332	322	£2824.23	-2	2022-01-06	2022-01-01 14:00:00

If you want to bring the NaNs to the top, you can use na_position='first'.

df.sort_values(by='sessions', na_position='first')

	page	day	users	sessions	pageviews	events	revenue	score	dates	datetimes
2	3	Tuesday	45.0	NaN	2342	36323	£753.35	3	2022-01-03	2022-01-01 11:00:00
5	6	Saturday	35.0	NaN	332	322	£2824.23	-2	2022-01-06	2022-01-01 14:00:00
6	7	Friday	42.0	233.0	300	2563	£1008.98	-1	2022-01-07	2022-01-01 15:00:00
4	5	Thursday	NaN	427.0	4322	374	£584.35	4	2022-01-05	2022-01-01 13:00:00
1	2	Wednesday	34.0	433.0	35	24663	£357.35	2	2022-01-02	2022-01-01 10:00:00
3	4	Sunday	46.0	478.0	3423	24463	£4732.34	-3	2022-01-04	2022-01-01 12:00:00
0	1	Monday	34.0	493.0	837	2347	£342.34	1	2022-01-01	2022-01-01 09:00:00

Using ignore_index to create a new index based on the sort order

The ignore_index parameter can be used to create a new index based on the sort order. This is useful if you want to reset the index after sorting. The default value is False and will leave the original index intact.

```python
df.sort_values(by='sessions', ignore_index=False)

	page	day	users	sessions	pageviews	events	revenue	score	dates	datetimes
6	7	Friday	42.0	233.0	300	2563	£1008.98	-1	2022-01-07	2022-01-01 15:00:00
4	5	Thursday	NaN	427.0	4322	374	£584.35	4	2022-01-05	2022-01-01 13:00:00
1	2	Wednesday	34.0	433.0	35	24663	£357.35	2	2022-01-02	2022-01-01 10:00:00
3	4	Sunday	46.0	478.0	3423	24463	£4732.34	-3	2022-01-04	2022-01-01 12:00:00
0	1	Monday	34.0	493.0	837	2347	£342.34	1	2022-01-01	2022-01-01 09:00:00
2	3	Tuesday	45.0	NaN	2342	36323	£753.35	3	2022-01-03	2022-01-01 11:00:00
5	6	Saturday	35.0	NaN	332	322	£2824.23	-2	2022-01-06	2022-01-01 14:00:00

If you want to sort the data by a given sort order and then create a new index that reflects the sort order, you can do this by setting ignore_index to True.

df.sort_values(by='sessions', ignore_index=True)

	page	day	users	sessions	pageviews	events	revenue	score	dates	datetimes
0	7	Friday	42.0	233.0	300	2563	£1008.98	-1	2022-01-07	2022-01-01 15:00:00
1	5	Thursday	NaN	427.0	4322	374	£584.35	4	2022-01-05	2022-01-01 13:00:00
2	2	Wednesday	34.0	433.0	35	24663	£357.35	2	2022-01-02	2022-01-01 10:00:00
3	4	Sunday	46.0	478.0	3423	24463	£4732.34	-3	2022-01-04	2022-01-01 12:00:00
4	1	Monday	34.0	493.0	837	2347	£342.34	1	2022-01-01	2022-01-01 09:00:00
5	3	Tuesday	45.0	NaN	2342	36323	£753.35	3	2022-01-03	2022-01-01 11:00:00
6	6	Saturday	35.0	NaN	332	322	£2824.23	-2	2022-01-06	2022-01-01 14:00:00

Using key to sort after applying a lambda function

The new key argument added to the sort_values() function in Pandas 1.1 allows you to apply a lambda function to the column before sorting the data. This can be very useful when you want to sort object data that represent currency or monetary values, where the presence of a pound sign or dollar sign aids formatting but turns a float into an object, which can’t be sorted as effectively. For example, let’s see what happens when we attempt to sort the Revenue column in descending order. The values, as you can see below, don’t get sorted as you would expect.

df.sort_values(by='revenue', ascending=False)

	page	day	users	sessions	pageviews	events	revenue	score	dates	datetimes
2	3	Tuesday	45.0	NaN	2342	36323	£753.35	3	2022-01-03	2022-01-01 11:00:00
4	5	Thursday	NaN	427.0	4322	374	£584.35	4	2022-01-05	2022-01-01 13:00:00
3	4	Sunday	46.0	478.0	3423	24463	£4732.34	-3	2022-01-04	2022-01-01 12:00:00
1	2	Wednesday	34.0	433.0	35	24663	£357.35	2	2022-01-02	2022-01-01 10:00:00
0	1	Monday	34.0	493.0	837	2347	£342.34	1	2022-01-01	2022-01-01 09:00:00
5	6	Saturday	35.0	NaN	332	322	£2824.23	-2	2022-01-06	2022-01-01 14:00:00
6	7	Friday	42.0	233.0	300	2563	£1008.98	-1	2022-01-07	2022-01-01 15:00:00

By using the key parameter we can define a lambda function that gets applied to the column before the data gets sorted, but which retains the original formatting. The example below uses the replace() function to replace the pound sign with a blank space, then uses astype() to cast the object dtype to a float dtype, allowing it to be sorted correctly. The sort_values() returns the sorted values, but leaves the pound sign in the data.

df.sort_values(by='revenue', 
               ascending=False, 
               key=lambda val: val.str.replace('£', '').astype(float)
              )

	page	day	users	sessions	pageviews	events	revenue	score	dates	datetimes
3	4	Sunday	46.0	478.0	3423	24463	£4732.34	-3	2022-01-04	2022-01-01 12:00:00
5	6	Saturday	35.0	NaN	332	322	£2824.23	-2	2022-01-06	2022-01-01 14:00:00
6	7	Friday	42.0	233.0	300	2563	£1008.98	-1	2022-01-07	2022-01-01 15:00:00
2	3	Tuesday	45.0	NaN	2342	36323	£753.35	3	2022-01-03	2022-01-01 11:00:00
4	5	Thursday	NaN	427.0	4322	374	£584.35	4	2022-01-05	2022-01-01 13:00:00
1	2	Wednesday	34.0	433.0	35	24663	£357.35	2	2022-01-02	2022-01-01 10:00:00
0	1	Monday	34.0	493.0	837	2347	£342.34	1	2022-01-01	2022-01-01 09:00:00

Using sort_index() to sort a dataframe by index number

Sometimes, you might want to sort the data in a dataframe based on the index itself. You can do this using the sort_index() method. The sort_index() method has the same parameters as sort_values(), but it only has one parameter, axis, which is set to 0 by default. The axis parameter is used to specify whether you want to sort the index or the columns. The axis parameter can be set to 0 to sort the index, or 1 to sort the columns.

df.sort_index(ascending=True)

	page	day	users	sessions	pageviews	events	revenue	score	dates	datetimes
0	1	Monday	34.0	493.0	837	2347	£342.34	1	2022-01-01	2022-01-01 09:00:00
1	2	Wednesday	34.0	433.0	35	24663	£357.35	2	2022-01-02	2022-01-01 10:00:00
2	3	Tuesday	45.0	NaN	2342	36323	£753.35	3	2022-01-03	2022-01-01 11:00:00
3	4	Sunday	46.0	478.0	3423	24463	£4732.34	-3	2022-01-04	2022-01-01 12:00:00
4	5	Thursday	NaN	427.0	4322	374	£584.35	4	2022-01-05	2022-01-01 13:00:00
5	6	Saturday	35.0	NaN	332	322	£2824.23	-2	2022-01-06	2022-01-01 14:00:00
6	7	Friday	42.0	233.0	300	2563	£1008.98	-1	2022-01-07	2022-01-01 15:00:00

Using sort_index() to sort a dataframe by column name

You can also use the sort_index() function to sort a Pandas dataframe by column name. By default, sort_index() sets ascending to True, so columns will appear in alphabetical order, however, you can easily change this by using df.sort_index(axis=1, ascending=False) instead of df.sort_index(axis=1).

df.sort_index(axis=1)

	dates	datetimes	day	events	page	pageviews	revenue	score	sessions	users
6	2022-01-07	2022-01-01 15:00:00	Friday	2563	7	300	£1008.98	-1	233.0	42.0
4	2022-01-05	2022-01-01 13:00:00	Thursday	374	5	4322	£584.35	4	427.0	NaN
1	2022-01-02	2022-01-01 10:00:00	Wednesday	24663	2	35	£357.35	2	433.0	34.0
3	2022-01-04	2022-01-01 12:00:00	Sunday	24463	4	3423	£4732.34	-3	478.0	46.0
0	2022-01-01	2022-01-01 09:00:00	Monday	2347	1	837	£342.34	1	493.0	34.0
2	2022-01-03	2022-01-01 11:00:00	Tuesday	36323	3	2342	£753.35	3	NaN	45.0
5	2022-01-06	2022-01-01 14:00:00	Saturday	322	6	332	£2824.23	-2	NaN	35.0

Using key and abs to sort a dataframe by absolute value

By default, when you sort a numeric column it will appear in ascending order, so if you’ve got a mixture of positive and negative values, the negative values will appear at the top of the column. You can flip the orientation so that the values appear in descending order by using df.sort_values(by='column_name', ascending=False).

df.sort_values(by='score')

	page	day	users	sessions	pageviews	events	revenue	score	dates	datetimes
3	4	Sunday	46.0	478.0	3423	24463	£4732.34	-3	2022-01-04	2022-01-01 12:00:00
5	6	Saturday	35.0	NaN	332	322	£2824.23	-2	2022-01-06	2022-01-01 14:00:00
6	7	Friday	42.0	233.0	300	2563	£1008.98	-1	2022-01-07	2022-01-01 15:00:00
0	1	Monday	34.0	493.0	837	2347	£342.34	1	2022-01-01	2022-01-01 09:00:00
1	2	Wednesday	34.0	433.0	35	24663	£357.35	2	2022-01-02	2022-01-01 10:00:00
2	3	Tuesday	45.0	NaN	2342	36323	£753.35	3	2022-01-03	2022-01-01 11:00:00
4	5	Thursday	NaN	427.0	4322	374	£584.35	4	2022-01-05	2022-01-01 13:00:00

In rare cases, you might want to consider negative and positive values equally via their absolute value. You can also do this using the key argument and by passing the Pandas abs function to it.

df.sort_values(by='score', key=pd.Series.abs)

	page	day	users	sessions	pageviews	events	revenue	score	dates	datetimes
6	7	Friday	42.0	233.0	300	2563	£1008.98	-1	2022-01-07	2022-01-01 15:00:00
0	1	Monday	34.0	493.0	837	2347	£342.34	1	2022-01-01	2022-01-01 09:00:00
1	2	Wednesday	34.0	433.0	35	24663	£357.35	2	2022-01-02	2022-01-01 10:00:00
5	6	Saturday	35.0	NaN	332	322	£2824.23	-2	2022-01-06	2022-01-01 14:00:00
3	4	Sunday	46.0	478.0	3423	24463	£4732.34	-3	2022-01-04	2022-01-01 12:00:00
2	3	Tuesday	45.0	NaN	2342	36323	£753.35	3	2022-01-03	2022-01-01 11:00:00
4	5	Thursday	NaN	427.0	4322	374	£584.35	4	2022-01-05	2022-01-01 13:00:00

Using sort_values() to sort by date

Dates are also handled by sort_values() by default. You can sort by date by passing the column name to the by argument. The default sorting order is ascending, so the earliest date will be at the top of the dataframe. You can change this by passing ascending=False to the sort_values() function.

df.sort_values(by='dates', ascending=False)

	page	day	users	sessions	pageviews	events	revenue	score	dates	datetimes
6	7	Friday	42.0	233.0	300	2563	£1008.98	-1	2022-01-07	2022-01-01 15:00:00
5	6	Saturday	35.0	NaN	332	322	£2824.23	-2	2022-01-06	2022-01-01 14:00:00
4	5	Thursday	NaN	427.0	4322	374	£584.35	4	2022-01-05	2022-01-01 13:00:00
3	4	Sunday	46.0	478.0	3423	24463	£4732.34	-3	2022-01-04	2022-01-01 12:00:00
2	3	Tuesday	45.0	NaN	2342	36323	£753.35	3	2022-01-03	2022-01-01 11:00:00
1	2	Wednesday	34.0	433.0	35	24663	£357.35	2	2022-01-02	2022-01-01 10:00:00
0	1	Monday	34.0	493.0	837	2347	£342.34	1	2022-01-01	2022-01-01 09:00:00

Using sort_values() to sort by datetime

You can also sort by datetime by passing the column name to the by argument. The default sorting order is ascending, so the earliest datetime will be at the top of the dataframe. You can change this by passing ascending=False to the sort_values() function.

df.sort_values(by='datetimes', ascending=True)

	page	day	users	sessions	pageviews	events	revenue	score	dates	datetimes
0	1	Monday	34.0	493.0	837	2347	£342.34	1	2022-01-01	2022-01-01 09:00:00
1	2	Wednesday	34.0	433.0	35	24663	£357.35	2	2022-01-02	2022-01-01 10:00:00
2	3	Tuesday	45.0	NaN	2342	36323	£753.35	3	2022-01-03	2022-01-01 11:00:00
3	4	Sunday	46.0	478.0	3423	24463	£4732.34	-3	2022-01-04	2022-01-01 12:00:00
4	5	Thursday	NaN	427.0	4322	374	£584.35	4	2022-01-05	2022-01-01 13:00:00
5	6	Saturday	35.0	NaN	332	322	£2824.23	-2	2022-01-06	2022-01-01 14:00:00
6	7	Friday	42.0	233.0	300	2563	£1008.98	-1	2022-01-07	2022-01-01 15:00:00

Using a CategoricalDtype to create a custom sort order

Finally, you can even create custom sort orders of your own using the Pandas CategoricalDtype class. While the ascending parameter allows you to put categorical variables or object dtypes into alphabetical order, sometimes this won’t make sense. For example, as we saw above, weekdays in the Day column of our test dataset are placed in alphabetical order, but this doesn’t represent the order in which they should be sorted, i.e. Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, Sunday. Similarly, month names and product sizes, such as S, XS, M, L, XL, XXL etc, require special treatment via the creation of a custom sort order.

A really neat way to do this is via the Pandas CategoricalDtype class of pandas.api.types. To use this, you’ll first need to load the class, then create a list of values to which you want to set the CategoricalDtype, and finally pass in the ordered=True argument which defines the custom sort order. Once that’s done, you can use the astype() function to cast the object data to a new category dtype, then you can pass the column name to by and sort the data using your custom sort order.

from pandas.api.types import CategoricalDtype

weekday = CategoricalDtype(['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday'], ordered=True)

df['day'] = df['day'].astype(weekday)

df.sort_values(by='day')

	page	day	users	sessions	pageviews	events	revenue	score	dates	datetimes
0	1	Monday	34.0	493.0	837	2347	£342.34	1	2022-01-01	2022-01-01 09:00:00
2	3	Tuesday	45.0	NaN	2342	36323	£753.35	3	2022-01-03	2022-01-01 11:00:00
1	2	Wednesday	34.0	433.0	35	24663	£357.35	2	2022-01-02	2022-01-01 10:00:00
4	5	Thursday	NaN	427.0	4322	374	£584.35	4	2022-01-05	2022-01-01 13:00:00
6	7	Friday	42.0	233.0	300	2563	£1008.98	-1	2022-01-07	2022-01-01 15:00:00
5	6	Saturday	35.0	NaN	332	322	£2824.23	-2	2022-01-06	2022-01-01 14:00:00
3	4	Sunday	46.0	478.0	3423	24463	£4732.34	-3	2022-01-04	2022-01-01 12:00:00

df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 7 entries, 6 to 5
Data columns (total 10 columns):
 #   Column     Non-Null Count  Dtype         
---  ------     --------------  -----         
 0   page       7 non-null      int64         
 1   day        7 non-null      category      
 2   users      6 non-null      float64       
 3   sessions   5 non-null      float64       
 4   pageviews  7 non-null      int64         
 5   events     7 non-null      int64         
 6   revenue    7 non-null      object        
 7   score      7 non-null      int64         
 8   dates      7 non-null      datetime64[ns]
 9   datetimes  7 non-null      datetime64[ns]
dtypes: category(1), datetime64[ns](2), float64(2), int64(4), object(1)
memory usage: 923.0+ bytes

Matt Clarke, Sunday, November 06, 2022

Matt Clarke Matt is an Ecommerce and Marketing Director who uses data science to help in his work. Matt has a Master's degree in Internet Retailing (plus two other Master's degrees in different fields) and specialises in the technical side of ecommerce and marketing.