The Pandas map()
function can be used to map the values of a series to another set of values or run a custom function. It runs at the series level, rather than across a whole dataframe, and is a very useful method for engineering new features based on the values of other columns.
In this simple tutorial, we will look at how to use the map()
function to map values in a series to another set of values, both using a custom function and using a mapping from a Python dictionary.
To get started, import the Pandas library using the import pandas as pd
naming convention, then either create a Pandas dataframe containing some dummy data. We’ll create a tiny dataframe containing the scientific names of some fish species and their lengths.
import pandas as pd
df = pd.DataFrame(
[('Pterophyllum altum', 'Pterophyllum', 12.5),
('Coptodon snyderae', 'Coptodon', 8.2),
('Astronotus ocellatus', 'Astronotus', 31.2),
('Corydoras aeneus', 'Corydoras', 5.3),
('Xenomystus nigri', 'Xenomystus', 5.3)
],
columns=['species', 'genus', 'length_cm']
)
df
species | genus | length_cm | |
---|---|---|---|
0 | Pterophyllum altum | Pterophyllum | 12.5 |
1 | Coptodon snyderae | Coptodon | 8.2 |
2 | Astronotus ocellatus | Astronotus | 31.2 |
3 | Corydoras aeneus | Corydoras | 5.3 |
4 | Xenomystus nigri | Xenomystus | 5.3 |
First, we’ll look at how to use the map()
function to map the values in a Pandas column or series to the values in a Python dictionary. We’ll create a dictionary called mappings
that contains the genus
as the key and the family
as the value. Then we’ll use the map()
function to map the values in the genus
column to the values in the mappings
dictionary and save the results to a new column called family
.
mappings = {
'Pterophyllum': 'Cichlidae',
'Coptodon': 'Cichlidae',
'Astronotus': 'Cichlidae',
'Corydoras': 'Callichthyidae',
}
df['family'] = df['genus'].map(mappings)
df
When the map()
function finds a match for the column value in the dictionary it will pass the dictionary value back so it’s stored in the new column. If no matching value is found in the dictionary, the map()
function returns a NaN
value. You can use the Pandas fillna()
function to handle any such values present.
species | genus | length_cm | family | |
---|---|---|---|---|
0 | Pterophyllum altum | Pterophyllum | 12.5 | Cichlidae |
1 | Coptodon snyderae | Coptodon | 8.2 | Cichlidae |
2 | Astronotus ocellatus | Astronotus | 31.2 | Cichlidae |
3 | Corydoras aeneus | Corydoras | 5.3 | Callichthyidae |
4 | Xenomystus nigri | Xenomystus | 5.3 | NaN |
The other way to use the Pandas map()
function is to map values in a column to new values using a custom function. This allows you to use some more complex logic to select how a Pandas column value is mapped to some other value.
We’ll first create a little custom function called get_size_label()
that takes the value from the length_cm
column and returns a string label for the size of the fish. We’ll then use the map()
function to apply this function to each value in the length_cm
column and create a new column called size_label
with the size label for each fish.
def get_size_label(length_cm):
if length_cm < 10:
return 'small'
elif length_cm < 20:
return 'medium'
else:
return 'large'
df['size'] = df['length_cm'].map(get_size_label)
df
species | genus | length_cm | family | size | |
---|---|---|---|---|---|
0 | Pterophyllum altum | Pterophyllum | 12.5 | Cichlidae | medium |
1 | Coptodon snyderae | Coptodon | 8.2 | Cichlidae | small |
2 | Astronotus ocellatus | Astronotus | 31.2 | Cichlidae | large |
3 | Corydoras aeneus | Corydoras | 5.3 | Callichthyidae | small |
4 | Xenomystus nigri | Xenomystus | 5.3 | NaN | small |
Matt Clarke, Sunday, January 08, 2023