While many Pandas operations don’t require or benefit from an explicitly named index on the dataframe, named indexes (or indices) can be beneficial for some tasks because a wide range of functions are designed to let you manipulate the data based on index values.
By default, Pandas will add a basic numeric index and this is fine for most tasks. However, you can also add explicitly named indexes with a single level or a multi-level or hierarchical index using the pandas.DataFrame.set_index()
method. Similarly, you can remove, drop, or reset indexes using the pandas.DataFrame.reset_index()
method. In this quick tutorial I’ll show you how to use them.
To get started, open a Jupyter notebook and import the Pandas library using the import pandas as pd
naming convention. Then either import data into Pandas from a CSV file or create a dataframe of dummy data to work with. Note that when the dataframe is created, Pandas will add its own numeric index without the need for you to explicitly set a column to use as the index.
import pandas as pd
df = pd.DataFrame(
[('Pterophyllum altum', 'Pterophyllum', 12.5),
('Coptodon snyderae', 'Coptodon', 8.2),
('Coptodon discolor', 'Coptodon', 18.5),
('Astronotus ocellatus', 'Astronotus', 31.2),
('Astronotus crassipinnis', 'Astronotus', 31.2),
('Corydoras aeneus', 'Corydoras', 5.3),
('Corydoras paleatus', 'Corydoras', 6.3),
('Xenomystus nigri', 'Xenomystus', 5.3)
],
columns=['species', 'genus', 'length_cm']
)
df
species | genus | length_cm | |
---|---|---|---|
0 | Pterophyllum altum | Pterophyllum | 12.5 |
1 | Coptodon snyderae | Coptodon | 8.2 |
2 | Coptodon discolor | Coptodon | 18.5 |
3 | Astronotus ocellatus | Astronotus | 31.2 |
4 | Astronotus crassipinnis | Astronotus | 31.2 |
5 | Corydoras aeneus | Corydoras | 5.3 |
6 | Corydoras paleatus | Corydoras | 6.3 |
7 | Xenomystus nigri | Xenomystus | 5.3 |
If you wish to designate a specific column in your dataframe to become the index you can do so using the Pandas pandas.DataFrame.set_index()
function. For a basic single-level dataframe this takes a string value corresponding to the name of the column you wish to set as the index.
The set_index()
is called using dot notation to append it to the dataframe object and returns a dataframe without transforming the original. If you want to save the output and overwrite the original dataframe object you need to redeclare it, i.e. df = df.set_index('species')
.
The function does include an inplace=True
option to avoid the need to do this, but inplace=True
is set to be removed from Pandas in the future, so it’s best to avoid using it in your code. After running the code you’ll see that the numeric index has been replaced using the values from the species
column.
df = df.set_index('species')
df
genus | length_cm | |
---|---|---|
species | ||
Pterophyllum altum | Pterophyllum | 12.5 |
Coptodon snyderae | Coptodon | 8.2 |
Coptodon discolor | Coptodon | 18.5 |
Astronotus ocellatus | Astronotus | 31.2 |
Astronotus crassipinnis | Astronotus | 31.2 |
Corydoras aeneus | Corydoras | 5.3 |
Corydoras paleatus | Corydoras | 6.3 |
Xenomystus nigri | Xenomystus | 5.3 |
To remove the index and revert to a regular numeric index you can use the pandas.DataFrame.reset_index()
method. This can accept a few different parameters but the defaults are usually sufficient in most cases.
Calling df = df.reset_index()
will remove the species
column currently being used for the index and revert to use it as a regular column or series in the dataframe by replacing it with a numeric index.
As with set_index()
, the reset_index()
function takes a dataframe as its input and returns a dataframe as its output, so must either be used with its optional inplace=True
parameter, or better used to overwrite the original dataframe object.
df = df.reset_index()
df
species | genus | length_cm | |
---|---|---|---|
0 | Pterophyllum altum | Pterophyllum | 12.5 |
1 | Coptodon snyderae | Coptodon | 8.2 |
2 | Coptodon discolor | Coptodon | 18.5 |
3 | Astronotus ocellatus | Astronotus | 31.2 |
4 | Astronotus crassipinnis | Astronotus | 31.2 |
5 | Corydoras aeneus | Corydoras | 5.3 |
6 | Corydoras paleatus | Corydoras | 6.3 |
7 | Xenomystus nigri | Xenomystus | 5.3 |
Sometimes, your index might be so useless that you want to both reset it and revert to a numeric index and also remove it entirely from your dataframe. You can do this using the drop=True
argument.
First, we’ll set the species
column to be the index of the dataframe, then we’ll call df = df.reset_index('species', drop=True)
to reset the index and remove or drop the species
column afterwards. This returns a new dataframe that we’ll use to overwrite the original, which now contains only two columns, genus
and `species.
df = df.set_index('species')
df
genus | length_cm | |
---|---|---|
species | ||
Pterophyllum altum | Pterophyllum | 12.5 |
Coptodon snyderae | Coptodon | 8.2 |
Coptodon discolor | Coptodon | 18.5 |
Astronotus ocellatus | Astronotus | 31.2 |
Astronotus crassipinnis | Astronotus | 31.2 |
Corydoras aeneus | Corydoras | 5.3 |
Corydoras paleatus | Corydoras | 6.3 |
Xenomystus nigri | Xenomystus | 5.3 |
df = df.reset_index('species', drop=True)
df
genus | length_cm | |
---|---|---|
0 | Pterophyllum | 12.5 |
1 | Coptodon | 8.2 |
2 | Coptodon | 18.5 |
3 | Astronotus | 31.2 |
4 | Astronotus | 31.2 |
5 | Corydoras | 5.3 |
6 | Corydoras | 6.3 |
7 | Xenomystus | 5.3 |
Pandas also supports the creation of dataframes with multi-level indexes. They can be beneficial for displaying certain datasets, but I tend to only use them relatively rarely myself.
Multi-level indexes (or indices) can be added to Pandas dataframes by passing a list of column names to the set_index()
function. For example, if we pass ['species', 'genus']
to set_index()
Pandas will add an index with two columns.
df = pd.DataFrame(
[('Pterophyllum altum', 'Pterophyllum', 12.5),
('Coptodon snyderae', 'Coptodon', 8.2),
('Coptodon discolor', 'Coptodon', 18.5),
('Astronotus ocellatus', 'Astronotus', 31.2),
('Astronotus crassipinnis', 'Astronotus', 31.2),
('Corydoras aeneus', 'Corydoras', 5.3),
('Corydoras paleatus', 'Corydoras', 6.3),
('Xenomystus nigri', 'Xenomystus', 5.3)
],
columns=['species', 'genus', 'length_cm']
).set_index(['species', 'genus'])
df
length_cm | ||
---|---|---|
species | genus | |
Pterophyllum altum | Pterophyllum | 12.5 |
Coptodon snyderae | Coptodon | 8.2 |
Coptodon discolor | Coptodon | 18.5 |
Astronotus ocellatus | Astronotus | 31.2 |
Astronotus crassipinnis | Astronotus | 31.2 |
Corydoras aeneus | Corydoras | 5.3 |
Corydoras paleatus | Corydoras | 6.3 |
Xenomystus nigri | Xenomystus | 5.3 |
If you want to remove a multi-level index from a Pandas dataframe you can use the reset_index()
method we covered above. By default, if you don’t define which index level you wish to remove, Pandas will remove them all and the index values will revert to being columns within the dataframe.
df.reset_index()
species | genus | length_cm | |
---|---|---|---|
0 | Pterophyllum altum | Pterophyllum | 12.5 |
1 | Coptodon snyderae | Coptodon | 8.2 |
2 | Coptodon discolor | Coptodon | 18.5 |
3 | Astronotus ocellatus | Astronotus | 31.2 |
4 | Astronotus crassipinnis | Astronotus | 31.2 |
5 | Corydoras aeneus | Corydoras | 5.3 |
6 | Corydoras paleatus | Corydoras | 6.3 |
7 | Xenomystus nigri | Xenomystus | 5.3 |
If you want to modify a dataframe with a multi-level index to become one with a single-level index you can explicitly name the index level you wish to reset. The numbers used go from zero upwards, so to reset the genus
index and revert its values to those of a regular column or series we’d enter df.reset_index(level=1)
, because it’s the second index level.
df
length_cm | ||
---|---|---|
species | genus | |
Pterophyllum altum | Pterophyllum | 12.5 |
Coptodon snyderae | Coptodon | 8.2 |
Coptodon discolor | Coptodon | 18.5 |
Astronotus ocellatus | Astronotus | 31.2 |
Astronotus crassipinnis | Astronotus | 31.2 |
Corydoras aeneus | Corydoras | 5.3 |
Corydoras paleatus | Corydoras | 6.3 |
Xenomystus nigri | Xenomystus | 5.3 |
df.reset_index(level=1)
genus | length_cm | |
---|---|---|
species | ||
Pterophyllum altum | Pterophyllum | 12.5 |
Coptodon snyderae | Coptodon | 8.2 |
Coptodon discolor | Coptodon | 18.5 |
Astronotus ocellatus | Astronotus | 31.2 |
Astronotus crassipinnis | Astronotus | 31.2 |
Corydoras aeneus | Corydoras | 5.3 |
Corydoras paleatus | Corydoras | 6.3 |
Xenomystus nigri | Xenomystus | 5.3 |
Matt Clarke, Sunday, January 01, 2023