How to add a new column to a Pandas dataframe

Learn how to add new columns to a Pandas dataframe using insert, loc, assign, and by manually declaring the columns using scalars, lists, and calculated values.

How to add a new column to a Pandas dataframe
Picture by Cottonbro, Pexels.
10 minutes to read

Pandas is extremely versatile and includes a wide range of different methods you can use to add a new column or series to an existing dataframe. Whether you want to assign a constant value or scalar to every row, a specific value based on the individual row, or a new column based on a calculation or other operation, Pandas makes it easy.

In this tutorial I’ll show you a range of techniques you can use to quickly and easily add a column to a Pandas dataframe. We’ll cover manually assigning columns, creating new columns from lists, and using the insert(), assign(), and loc techniques.

Create a Pandas dataframe

To get started, open a Jupyter notebook, import Pandas and create a Pandas dataframe. We’ll use this in the series of examples below and will add new columns to the dataframe using a variety of techniques.

import pandas as pd

df = pd.DataFrame({
    'model': ['XF', 'XE', 'XJ'], 
    'top_speed': [120, 121, 145]
})

df
model top_speed
0 XF 120
1 XE 121
2 XJ 145

Add a new Pandas column by declaring a name and value

The most common way to add a new column to a Pandas dataframe is simply to declare the new column name and assign either a scalar (a single value that will be applied to every row) or a Python list of values corresponding to each row in the dataframe. In the example below, we’ll define a new column called manufacturer to which we’ll assign a scalar or constant value, then we’ll define a column called mpg to which we’ll assign a list of values.

df['manufacturer'] = 'Jaguar'
df['mpg'] = [45, 49, 41]
df
model top_speed manufacturer mpg
0 XF 120 Jaguar 45
1 XE 121 Jaguar 49
2 XJ 145 Jaguar 41

Add a new column using insert()

The Pandas insert() function allows you to insert or add a new column to a Pandas dataframe based on either a scalar or constant value or a list of values. The benefit of using insert() over just assigning the values is that you can use the loc argument to define where the new column sits within the dataframe.

In the example below we’ll using insert() to add a column called base_price to which we’ll pass a list of values and we’ll place it in column 2. Since numbers count from 0 in Python, that puts it in the third column, rather than the second.

df.insert(loc=2,
          column='base_price',
          value=[35.1, 32.2, 53.7])
df
model top_speed base_price manufacturer mpg
0 XF 120 35.1 Jaguar 45
1 XE 121 32.2 Jaguar 49
2 XJ 145 53.7 Jaguar 41

Add a new column using assign()

The Pandas assign() function works in a similar manner but instead returns a dataframe containing the newly defined column. It can also be assigned a scalar value that you want to add to every row, a list of values to that are assigned to each row in order, and also a lambda function based on a calculated value.

df = df.assign(top_price=[47730, 42345, 122345])
df
model top_speed base_price manufacturer mpg top_price
0 XF 120 35.1 Jaguar 45 47730
1 XE 121 32.2 Jaguar 49 42345
2 XJ 145 53.7 Jaguar 41 122345

Add a new column using .loc[]

The .loc[] method also lets you assign a new column or series to a Pandas dataframe. Personally, I find it a bit clunkier than the other methods, so tend to use the others more often. As with the other methods, you can define a scalar value, a list of values, or perform a calculation based on other columns within the dataframe.

df.loc[:,'0_to_60'] = [5.5, 6.2, 4.1]
df
model top_speed base_price manufacturer mpg top_price 0_to_60
0 XF 120 35.1 Jaguar 45 47730 5.5
1 XE 121 32.2 Jaguar 49 42345 6.2
2 XJ 145 53.7 Jaguar 41 122345 4.1

Add a new column based on a calculation

You can also use most of these approaches to add a new column to a Pandas dataframe based on a calculation or other function performed on one or more other columns in the dataframe. Ignore the fact that the below calculation is totally nonsense, but here’s how you can do this using the various approaches shown above.

df['calculation'] = df['top_speed'] / df['base_price']
df
model top_speed base_price manufacturer mpg top_price 0_to_60 calculation
0 XF 120 35.1 Jaguar 45 47730 5.5 3.418803
1 XE 121 32.2 Jaguar 49 42345 6.2 3.757764
2 XJ 145 53.7 Jaguar 41 122345 4.1 2.700186
df.insert(loc=4,
          column='calculation2',
          value=df['top_speed'] / df['base_price'])
df
model top_speed base_price manufacturer calculation2 mpg top_price 0_to_60 calculation
0 XF 120 35.1 Jaguar 3.418803 45 47730 5.5 3.418803
1 XE 121 32.2 Jaguar 3.757764 49 42345 6.2 3.757764
2 XJ 145 53.7 Jaguar 2.700186 41 122345 4.1 2.700186
df = df.assign(calculation3=df['top_speed'] / df['base_price'])
df
model top_speed base_price manufacturer calculation2 mpg top_price 0_to_60 calculation calculation3
0 XF 120 35.1 Jaguar 3.418803 45 47730 5.5 3.418803 3.418803
1 XE 121 32.2 Jaguar 3.757764 49 42345 6.2 3.757764 3.757764
2 XJ 145 53.7 Jaguar 2.700186 41 122345 4.1 2.700186 2.700186
df.loc[:,'calculation4'] = df['top_speed'] / df['base_price']
df
model top_speed base_price manufacturer calculation2 mpg top_price 0_to_60 calculation calculation3 calculation4
0 XF 120 35.1 Jaguar 3.418803 45 47730 5.5 3.418803 3.418803 3.418803
1 XE 121 32.2 Jaguar 3.757764 49 42345 6.2 3.757764 3.757764 3.757764
2 XJ 145 53.7 Jaguar 2.700186 41 122345 4.1 2.700186 2.700186 2.700186

Matt Clarke, Saturday, November 05, 2022

Matt Clarke Matt is an Ecommerce and Marketing Director who uses data science to help in his work. Matt has a Master's degree in Internet Retailing (plus two other Master's degrees in different fields) and specialises in the technical side of ecommerce and marketing.