How to use Pandas assign() to create new dataframe columns

Learn how to use the Pandas assign() method to create new dataframe columns and avoid the dreaded SettingWithCopyWarning problem.

How to use Pandas assign() to create new dataframe columns
Picture by Fauxels, Pexels.
7 minutes to read

The Pandas assign() function is used to create new columns in a dataframe, usually based on calculations. The assign() function takes the name of the new column to create along with the value to assign, which can come from a calculation of existing dataframe columns or from a lambda function.

The assign() function returns a new dataframe with the new column added and the original dataframe is not modified, so you need to save the output back to the original dataframe variable to retain the new column. In this simple tutorial, I’ll show how you can use the Pandas assign() method to avoid the Pandas SettingWithCopyWarning error and use the function in method chaining code.

Create a Pandas dataframe

To get started, open a Jupyter notebook and import the Pandas library using the pandas as pd naming convention, then create a Pandas dataframe. This one contains the scientific name of some fish species and their lengths in centimetres. We’ll perform some calculations on these lengths in a second.


import pandas as pd

df = pd.DataFrame(
    [('Pterophyllum altum', 12.56), 
     ('Pterophyllum scalare', 11.82),
     ('Pterophyllum leopoldi', 14.23)], 
    columns=['species', 'length_cm']
)
df
species length_cm
0 Pterophyllum altum 12.56
1 Pterophyllum scalare 11.82
2 Pterophyllum leopoldi 14.23

Use assign() to create a new column

The most common way to use the Pandas assign() method is to append it to a Pandas dataframe object. The first argument length_mm will become the name of the new Pandas column, while the =df['length_cm'] * 10 part will take the length_cm value, multiply it by 10 and assign the value to length_mm. The assign() method returns a dataframe containing the new column, so we’ll need to reassign this back to df to save it.

df = df.assign(length_mm = df['length_cm'] * 10)
df
species length_cm length_mm
0 Pterophyllum altum 12.56 125.6
1 Pterophyllum scalare 11.82 118.2
2 Pterophyllum leopoldi 14.23 142.3

Use assign() to create a new column using a lambda function

You can use lambda functions to achieve the same thing as we saw above. Importantly, if you’re using the modern method chaining approach, you will need to use lambda functions, otherwise you risk modifying an earlier version of the dataframe, not the current step within your chain.

df = df.assign(length_m = lambda x: x['length_cm'] / 100)
df
species length_cm length_mm length_m
0 Pterophyllum altum 12.56 125.6 0.1256
1 Pterophyllum scalare 11.82 118.2 0.1182
2 Pterophyllum leopoldi 14.23 142.3 0.1423

Use assign() to create multiple columns using lambda functions

The other really neat thing is that you can also call multiple lambda functions. Here we’ll create new columns to hold the length_in and length_ft values for each species, then save them back to the original dataframe.

df = df.assign(length_in = lambda x: x['length_cm'] * 0.393701,
               length_ft = lambda x: x['length_cm'] * 0.0328084)
df
species length_cm length_mm length_m length_in length_ft
0 Pterophyllum altum 12.56 125.6 0.1256 4.944885 0.412074
1 Pterophyllum scalare 11.82 118.2 0.1182 4.653546 0.387795
2 Pterophyllum leopoldi 14.23 142.3 0.1423 5.602365 0.466864

Use assign() with method chaining

Finally, the coolest way to use the Pandas assign() method is with the method chaining technique. This modern Pandas programming style can aid code readability and improve performance, so it’s become more popular - despite remaining a somewhat decisive construct that some data scientists really don’t like.

To use the assign() method you wrap your arguments up in parentheses, which allow you to format what is essentially a one-liner into a readable form using whitespace. In the example below, we’ll create a new dataframe called df_chain, call the original df dataframe we created above, and then call a series of lambda functions to perform our calculations and create new columns in the dataframe.

df_chain = (
    df
    .assign(length_mm = lambda x: x['length_cm'] * 10,
           length_m = lambda x: x['length_cm'] / 100,
           length_in = lambda x: x['length_cm'] * 0.393701,
           length_ft = lambda x: x['length_cm'] * 0.0328084
    )
)

df_chain
species length_cm length_mm length_m length_in length_ft
0 Pterophyllum altum 12.56 125.6 0.1256 4.944885 0.412074
1 Pterophyllum scalare 11.82 118.2 0.1182 4.653546 0.387795
2 Pterophyllum leopoldi 14.23 142.3 0.1423 5.602365 0.466864

Matt Clarke, Saturday, January 07, 2023

Matt Clarke Matt is an Ecommerce and Marketing Director who uses data science to help in his work. Matt has a Master's degree in Internet Retailing (plus two other Master's degrees in different fields) and specialises in the technical side of ecommerce and marketing.