When working with Pandas dataframes it really helps to have clear and consistent naming conventions for column labels or column names, and for the column values themselves. Adding a prefix to the beginning of a column name or column value, or a suffix to the end of a column name or column value is a great way to achieve this.
Pandas includes various functions that can be used to add a prefix or suffix to column names or values, whether applied to an entire dataframe, or to selected columns or series. Each has its pros and cons, so you might need to use a variety of approaches depending on your objective.
In this quick tutorial, we’ll go over some code examples that use the Pandas add_suffix()
and add_prefix()
functions for adding suffixes and prefixes to column labels or names and to series labels or names. We’ll also use the rename()
method and list comprehension to add a prefix or suffix to Pandas column or row values.
To get started, open a Jupyter notebook and import Pandas. Then, either import your own data into Pandas, or create some dummy data in a new dataframe.
import pandas as pd
df = pd.DataFrame({'date': ['2019-01-01', '2019-01-02', '2019-01-03', '2019-01-04', '2019-01-05'],
'revenue': [100, 200, 300, 400, 500],
'transactions': [10, 20, 30, 40, 50],
'campaign': ['A', 'B', 'C', 'D', 'E']})
df
date | revenue | transactions | campaign | |
---|---|---|---|---|
0 | 2019-01-01 | 100 | 10 | A |
1 | 2019-01-02 | 200 | 20 | B |
2 | 2019-01-03 | 300 | 30 | C |
3 | 2019-01-04 | 400 | 40 | D |
4 | 2019-01-05 | 500 | 50 | E |
The first method we’ll look at is the add_prefix()
method which, unsurprisingly, add a prefix to Pandas column labels or names. The add_prefix()
function is actually somewhat limited and takes a single argument containing a string that is to be appended to the beginning of each column name. It returns the original dataframe with the column names prefixed with the provided value.
df1 = df.copy()
df1 = df1.add_prefix('email_')
df1
email_date | email_revenue | email_transactions | email_campaign | |
---|---|---|---|---|
0 | 2019-01-01 | 100 | 10 | A |
1 | 2019-01-02 | 200 | 20 | B |
2 | 2019-01-03 | 300 | 30 | C |
3 | 2019-01-04 | 400 | 40 | D |
4 | 2019-01-05 | 500 | 50 | E |
You can also pass a Series to add_prefix()
. When you do this, the add_prefix()
function will instead prefix the row labels or values. However, personally, I prefer to do this in a different way, which we’ll cover later.
s = pd.Series([1, 2, 3, 4, 5])
s = s.add_prefix('pre_')
s
pre_0 1
pre_1 2
pre_2 3
pre_3 4
pre_4 5
dtype: int64
As you might imagine, the similarly named add_suffix()
method works in a largely identical manner to add_prefix()
but adds a suffix to the column label or column name instead.
df2 = df.copy()
df2 = df2.add_suffix('_email')
df2
date_email | revenue_email | transactions_email | campaign_email | |
---|---|---|---|---|
0 | 2019-01-01 | 100 | 10 | A |
1 | 2019-01-02 | 200 | 20 | B |
2 | 2019-01-03 | 300 | 30 | C |
3 | 2019-01-04 | 400 | 40 | D |
4 | 2019-01-05 | 500 | 50 | E |
Like add_prefix()
, you can also pass a Pandas series object to add_suffix()
and it will then add a suffix to the row labels or values.
s = pd.Series([1, 2, 3, 4, 5])
s = s.add_suffix('_suf')
s
0_suf 1
1_suf 2
2_suf 3
3_suf 4
4_suf 5
dtype: int64
The Pandas rename()
function can also be used to add a prefix to column names.
df3 = df.copy()
df3 = df3.rename(columns={'date': 'email_date'})
df3
email_date | revenue | transactions | campaign | |
---|---|---|---|---|
0 | 2019-01-01 | 100 | 10 | A |
1 | 2019-01-02 | 200 | 20 | B |
2 | 2019-01-03 | 300 | 30 | C |
3 | 2019-01-04 | 400 | 40 | D |
4 | 2019-01-05 | 500 | 50 | E |
Since the Pandas rename()
function takes a dictionary of old and new column names to pass to its columns
parameter or argument, it can be used to add a prefix to multiple selected columns, unlike the add_prefix()
function which gets applied to the entire dataframe.
df4 = df.copy()
df4 = df4.rename(columns={'date': 'email_date',
'revenue': 'email_revenue'})
df4
email_date | email_revenue | transactions | campaign | |
---|---|---|---|---|
0 | 2019-01-01 | 100 | 10 | A |
1 | 2019-01-02 | 200 | 20 | B |
2 | 2019-01-03 | 300 | 30 | C |
3 | 2019-01-04 | 400 | 40 | D |
4 | 2019-01-05 | 500 | 50 | E |
List comprehension can also be used to add a prefix to column names.
df5 = df.copy()
df5.columns = ['email_' + col for col in df5.columns]
df5
email_date | email_revenue | email_transactions | email_campaign | |
---|---|---|---|---|
0 | 2019-01-01 | 100 | 10 | A |
1 | 2019-01-02 | 200 | 20 | B |
2 | 2019-01-03 | 300 | 30 | C |
3 | 2019-01-04 | 400 | 40 | D |
4 | 2019-01-05 | 500 | 50 | E |
df6 = df.copy()
df6.columns = ['email_' + col if col in ['revenue', 'transactions'] else col for col in df6.columns]
df6
date | email_revenue | email_transactions | campaign | |
---|---|---|---|---|
0 | 2019-01-01 | 100 | 10 | A |
1 | 2019-01-02 | 200 | 20 | B |
2 | 2019-01-03 | 300 | 30 | C |
3 | 2019-01-04 | 400 | 40 | D |
4 | 2019-01-05 | 500 | 50 | E |
df7 = df.copy()
df7.columns = [str(col) + '_suf' for col in df7.columns]
df7
date_suf | revenue_suf | transactions_suf | campaign_suf | |
---|---|---|---|---|
0 | 2019-01-01 | 100 | 10 | A |
1 | 2019-01-02 | 200 | 20 | B |
2 | 2019-01-03 | 300 | 30 | C |
3 | 2019-01-04 | 400 | 40 | D |
4 | 2019-01-05 | 500 | 50 | E |
df8 = df.copy()
df8.columns = [str(col) + '_suf'
if col in ['revenue', 'transactions']
else col for col in df8.columns]
df8
date | revenue_suf | transactions_suf | campaign | |
---|---|---|---|---|
0 | 2019-01-01 | 100 | 10 | A |
1 | 2019-01-02 | 200 | 20 | B |
2 | 2019-01-03 | 300 | 30 | C |
3 | 2019-01-04 | 400 | 40 | D |
4 | 2019-01-05 | 500 | 50 | E |
To assign a prefix to the values in a specific column you can concatenate the prefix string to the value stored in the Pandas column. In the first example, we’re adding a prefix to a column with an object
dtype, or a string, so we can prepend our prefix string without an issue.
df9 = df.copy()
df9['campaign'] = 'campaign_' + df9['campaign']
df9
date | revenue | transactions | campaign | |
---|---|---|---|---|
0 | 2019-01-01 | 100 | 10 | campaign_A |
1 | 2019-01-02 | 200 | 20 | campaign_B |
2 | 2019-01-03 | 300 | 30 | campaign_C |
3 | 2019-01-04 | 400 | 40 | campaign_D |
4 | 2019-01-05 | 500 | 50 | campaign_E |
In the next example, we’re attempting to prepend an int
dtype value or integer with a string. Unless you cast the dtype of the numeric column to a string using .astype(str)
, Pandas will return an error that says UFuncTypeError: ufunc 'add' did not contain a loop with signature matching types (dtype('<U9'), dtype('int64')) -> None
.
df10 = df.copy()
df10['revenue'] = 'campaign_' + df10['revenue'].astype(str)
df10
date | revenue | transactions | campaign | |
---|---|---|---|---|
0 | 2019-01-01 | campaign_100 | 10 | A |
1 | 2019-01-02 | campaign_200 | 20 | B |
2 | 2019-01-03 | campaign_300 | 30 | C |
3 | 2019-01-04 | campaign_400 | 40 | D |
4 | 2019-01-05 | campaign_500 | 50 | E |
We can also use string concatenation to add a suffix to specific column values. Since the value needs to be a string in order to be concatenated without throwing an exception, it makes sense to use .astype(str)
to convert the column to a string whenever you use this technique.
df11 = df.copy()
df11['campaign'] = df11['campaign'].astype(str) + '_suf'
df11['revenue'] = df11['revenue'].astype(str) + '_suf'
df11
date | revenue | transactions | campaign | |
---|---|---|---|---|
0 | 2019-01-01 | 100_suf | 10 | A_suf |
1 | 2019-01-02 | 200_suf | 20 | B_suf |
2 | 2019-01-03 | 300_suf | 30 | C_suf |
3 | 2019-01-04 | 400_suf | 40 | D_suf |
4 | 2019-01-05 | 500_suf | 50 | E_suf |
Matt Clarke, Saturday, November 26, 2022