How to add a new column to a Pandas dataframe

Picture by Cottonbro, Pexels.

10 minutes to read

Data Science Pandas

Pandas is extremely versatile and includes a wide range of different methods you can use to add a new column or series to an existing dataframe. Whether you want to assign a constant value or scalar to every row, a specific value based on the individual row, or a new column based on a calculation or other operation, Pandas makes it easy.

In this tutorial I’ll show you a range of techniques you can use to quickly and easily add a column to a Pandas dataframe. We’ll cover manually assigning columns, creating new columns from lists, and using the insert(), assign(), and loc techniques.

Create a Pandas dataframe

To get started, open a Jupyter notebook, import Pandas and create a Pandas dataframe. We’ll use this in the series of examples below and will add new columns to the dataframe using a variety of techniques.

import pandas as pd

df = pd.DataFrame({
    'model': ['XF', 'XE', 'XJ'], 
    'top_speed': [120, 121, 145]
})

df

	model	top_speed
0	XF	120
1	XE	121
2	XJ	145

Add a new Pandas column by declaring a name and value

The most common way to add a new column to a Pandas dataframe is simply to declare the new column name and assign either a scalar (a single value that will be applied to every row) or a Python list of values corresponding to each row in the dataframe. In the example below, we’ll define a new column called manufacturer to which we’ll assign a scalar or constant value, then we’ll define a column called mpg to which we’ll assign a list of values.

df['manufacturer'] = 'Jaguar'

df['mpg'] = [45, 49, 41]

df

	model	top_speed	manufacturer	mpg
0	XF	120	Jaguar	45
1	XE	121	Jaguar	49
2	XJ	145	Jaguar	41

Add a new column using insert()

The Pandas insert() function allows you to insert or add a new column to a Pandas dataframe based on either a scalar or constant value or a list of values. The benefit of using insert() over just assigning the values is that you can use the loc argument to define where the new column sits within the dataframe.

In the example below we’ll using insert() to add a column called base_price to which we’ll pass a list of values and we’ll place it in column 2. Since numbers count from 0 in Python, that puts it in the third column, rather than the second.

df.insert(loc=2,
          column='base_price',
          value=[35.1, 32.2, 53.7])

df

	model	top_speed	base_price	manufacturer	mpg
0	XF	120	35.1	Jaguar	45
1	XE	121	32.2	Jaguar	49
2	XJ	145	53.7	Jaguar	41

Add a new column using assign()

The Pandas assign() function works in a similar manner but instead returns a dataframe containing the newly defined column. It can also be assigned a scalar value that you want to add to every row, a list of values to that are assigned to each row in order, and also a lambda function based on a calculated value.

df = df.assign(top_price=[47730, 42345, 122345])

df

	model	top_speed	base_price	manufacturer	mpg	top_price
0	XF	120	35.1	Jaguar	45	47730
1	XE	121	32.2	Jaguar	49	42345
2	XJ	145	53.7	Jaguar	41	122345

Add a new column using .loc[]

The .loc[] method also lets you assign a new column or series to a Pandas dataframe. Personally, I find it a bit clunkier than the other methods, so tend to use the others more often. As with the other methods, you can define a scalar value, a list of values, or perform a calculation based on other columns within the dataframe.

df.loc[:,'0_to_60'] = [5.5, 6.2, 4.1]

df

	model	top_speed	base_price	manufacturer	mpg	top_price	0_to_60
0	XF	120	35.1	Jaguar	45	47730	5.5
1	XE	121	32.2	Jaguar	49	42345	6.2
2	XJ	145	53.7	Jaguar	41	122345	4.1

Add a new column based on a calculation

You can also use most of these approaches to add a new column to a Pandas dataframe based on a calculation or other function performed on one or more other columns in the dataframe. Ignore the fact that the below calculation is totally nonsense, but here’s how you can do this using the various approaches shown above.

df['calculation'] = df['top_speed'] / df['base_price']

df

	model	top_speed	base_price	manufacturer	mpg	top_price	0_to_60	calculation
0	XF	120	35.1	Jaguar	45	47730	5.5	3.418803
1	XE	121	32.2	Jaguar	49	42345	6.2	3.757764
2	XJ	145	53.7	Jaguar	41	122345	4.1	2.700186

df.insert(loc=4,
          column='calculation2',
          value=df['top_speed'] / df['base_price'])

df

	model	top_speed	base_price	manufacturer	calculation2	mpg	top_price	0_to_60	calculation
0	XF	120	35.1	Jaguar	3.418803	45	47730	5.5	3.418803
1	XE	121	32.2	Jaguar	3.757764	49	42345	6.2	3.757764
2	XJ	145	53.7	Jaguar	2.700186	41	122345	4.1	2.700186

df = df.assign(calculation3=df['top_speed'] / df['base_price'])

df

	model	top_speed	base_price	manufacturer	calculation2	mpg	top_price	0_to_60	calculation	calculation3
0	XF	120	35.1	Jaguar	3.418803	45	47730	5.5	3.418803	3.418803
1	XE	121	32.2	Jaguar	3.757764	49	42345	6.2	3.757764	3.757764
2	XJ	145	53.7	Jaguar	2.700186	41	122345	4.1	2.700186	2.700186

df.loc[:,'calculation4'] = df['top_speed'] / df['base_price']

df

	model	top_speed	base_price	manufacturer	calculation2	mpg	top_price	0_to_60	calculation	calculation3	calculation4
0	XF	120	35.1	Jaguar	3.418803	45	47730	5.5	3.418803	3.418803	3.418803
1	XE	121	32.2	Jaguar	3.757764	49	42345	6.2	3.757764	3.757764	3.757764
2	XJ	145	53.7	Jaguar	2.700186	41	122345	4.1	2.700186	2.700186	2.700186

Matt Clarke, Saturday, November 05, 2022

Matt Clarke Matt is an Ecommerce and Marketing Director who uses data science to help in his work. Matt has a Master's degree in Internet Retailing (plus two other Master's degrees in different fields) and specialises in the technical side of ecommerce and marketing.