Renaming Pandas dataframe columns is a common task for the data scientist. Neat, consistent column names make your dataframe easier to read and your code cleaner to write and maintain.
It’s a good idea to come up with your own consistent naming convention for formatting Pandas column names as it will tidy up your Pandas code and make it quicker to write code, as you’ll instantly know how the column name should be formatted.
In this quick tutorial, we’ll go over some example code to rename columns in Pandas dataframes. We’ll use the rename()
function to set the column names, show how you can add prefixes and suffixes to column names, convert them to uppercase or lowercase, and even slugify them.
To get started, open a Jupyter notebook and import the Pandas package using import pandas as pd
. Once that’s imported, we’ll create a Pandas dataframe from dictionary containing some content and the Pandas from_dict()
function.
import pandas as pd
data = {'Model ': ['Defender 90', 'Defender 110', 'Defender 130'],
'Wheel base': ['90 inches', '110 inches', '130 inches'],
'BHP': [140, 140, 140],
'Likelihood to break down': ['High', 'High', 'High']
}
df = pd.DataFrame.from_dict(data)
df
Model | Wheel base | BHP | Likelihood to break down | |
---|---|---|---|---|
0 | Defender 90 | 90 inches | 140 | High |
1 | Defender 110 | 110 inches | 140 | High |
2 | Defender 130 | 130 inches | 140 | High |
The most common way to rename columns in a Pandas dataframe is to use the rename()
function. This is appended to the dataframe, i.e. df.rename()
and is passed a dictionary that maps the current column name to the new column name.
Let’s change the Model
column name to defender model
, the Wheel base
column name to wheelbase inches
, the BHP
column to bhp
, and the Likelihood to break down
column name to unreliability
. Run the code and then print the head()
to see the new column names.
df = df.rename(columns={
'Model ': 'defender model',
'Wheel base': 'wheelbase inches',
'BHP': 'bhp',
'Likelihood to break down': 'unreliability'
})
df.head()
defender model | wheelbase inches | bhp | unreliability | |
---|---|---|---|---|
0 | Defender 90 | 90 inches | 140 | High |
1 | Defender 110 | 110 inches | 140 | High |
2 | Defender 130 | 130 inches | 140 | High |
Another common thing you may want to do is convert column names to lowercase. This can be done in a couple of different ways. The first way is to append the str.lower()
function to the output of df.columns
. The columns
argument a list of column names on the Pandas dataframe, while appending str.lower()
converts them to lowercase. Reassign the output back to df.columns
and you’ll rename the column names to lowercase.
df.columns = df.columns.str.lower()
df.head()
defender model | wheelbase inches | bhp | unreliability | |
---|---|---|---|---|
0 | Defender 90 | 90 inches | 140 | High |
1 | Defender 110 | 110 inches | 140 | High |
2 | Defender 130 | 130 inches | 140 | High |
The cleanest way to rename column names is to combine the rename()
function with str.lower()
. To do this we reassign df.rename(columns=str.lower)
back to df
. Alternatively, you can use inplace=True
and skip the df =
bit at the beginning.
df = df.rename(columns=str.lower)
df.head()
defender model | wheelbase inches | bhp | unreliability | |
---|---|---|---|---|
0 | Defender 90 | 90 inches | 140 | High |
1 | Defender 110 | 110 inches | 140 | High |
2 | Defender 130 | 130 inches | 140 | High |
The process for convert Pandas column names to uppercase is exactly the same as for lowercase, you just need to replace the lower()
function with upper()
instead.
df.columns = df.columns.str.upper()
df.head()
DEFENDER MODEL | WHEELBASE INCHES | BHP | UNRELIABILITY | |
---|---|---|---|---|
0 | Defender 90 | 90 inches | 140 | High |
1 | Defender 110 | 110 inches | 140 | High |
2 | Defender 130 | 130 inches | 140 | High |
You can also pass str.upper
to the columns argument of rename()
to perform the task of converting column names to uppercase using slightly cleaner code.
df = df.rename(columns=str.upper)
df.head()
DEFENDER MODEL | WHEELBASE INCHES | BHP | UNRELIABILITY | |
---|---|---|---|---|
0 | Defender 90 | 90 inches | 140 | High |
1 | Defender 110 | 110 inches | 140 | High |
2 | Defender 130 | 130 inches | 140 | High |
It’s also possible to add a prefix to column names using the add_prefix()
function. You can apply this to specific columns if you provide a subset, or you can append add_prefix()
to the dataframe and then pass in the prefix string. This is quite a common one during feature engineering, especially when working with highly dimensional datasets.
df1 = df.add_prefix('prefix_')
df1.head()
prefix_DEFENDER MODEL | prefix_WHEELBASE INCHES | prefix_BHP | prefix_UNRELIABILITY | |
---|---|---|---|---|
0 | Defender 90 | 90 inches | 140 | High |
1 | Defender 110 | 110 inches | 140 | High |
2 | Defender 130 | 130 inches | 140 | High |
There’s also an add_suffix()
function that works in exactly the same way and appends a string to the column name.
df2 = df.add_suffix('_suffix')
df2.head()
DEFENDER MODEL_suffix | WHEELBASE INCHES_suffix | BHP_suffix | UNRELIABILITY_suffix | |
---|---|---|---|---|
0 | Defender 90 | 90 inches | 140 | High |
1 | Defender 110 | 110 inches | 140 | High |
2 | Defender 130 | 130 inches | 140 | High |
One thing that often causes confusion when working with third-party data that’s been imported from a spreadsheet is the presence of whitespace in the column name. You’ll filter the Pandas dataframe only for it to return no data, or an error, because someone has inadvertently included whitespace in the column name. You can strip whitespace from Pandas column names using df.columns.str.strip().str.lower()
.
df.columns = df.columns.str.strip().str.lower()
df.columns
Index(['defender model', 'wheelbase inches', 'bhp', 'unreliability'], dtype='object')
Finally, one thing I like to do is to slugify column names. This removes whitespace and replaces it with underscores and ensures column names contain only letters, numbers, and underscores. It’s a great way to tidy up column names and keep them consistent.
df.columns = df.columns.str.lower().str.replace('[^0-9a-zA-Z]+','_',regex=True)
df.columns
Index(['defender_model', 'wheelbase_inches', 'bhp', 'unreliability'], dtype='object')
Matt Clarke, Friday, June 10, 2022