How to drop Pandas dataframe rows and columns

Picture by Pixabay, Pexels.

37 minutes to read

Data Science Pandas

When working with Pandas dataframes you’ll often need to drop, remove, or delete columns or rows from a dataframe to leave you with a clean dataframe containing tidier data you can use in your reports, visualisations, analyses, or models. As you’d expect, Pandas makes it very easy to drop data from dataframes and in this tutorial we’ll work through some simple practical examples to see how it’s done.

There are four main ways to drop columns or rows from a Pandas dataframe: the drop() function is used to drop columns and rows by condition; the dropna() function is used to drop rows or columns that contain NaN values or missing values; the drop_duplicates() function is used to drop rows or columns containing non-unique values when de-duping data; and the pop() method is used to drop a column and return the dropped series.

These versatile functions let you delete, remove, or drop selected columns or rows, whatever your needs, and they’re all very easy to learn to use.

Method	Use
`drop()`	The `drop()` function is used to drop columns or rows from a Pandas dataframe. Primarily, it's used to drop rows by column name or index, but can also be used to drop values that match certain criteria, such as rows that do or do not contain a certain value, or that are of a certain dtype.
`dropna()`	The `dropna()` function is used to drop rows or columns that contain missing values. It's commonly used in machine learning to remove NaN values prior to training a model.
`drop_duplicates()`	The `drop_duplicates()` function is used to drop rows that contain duplicate values. It's commonly used for de-duping or de-duplicating data.
`pop()`	The `pop()` function is used to drop a single column from a Pandas dataframe and can also be used to return the dropped series as a variable.

In this tutorial, we’ll go over the commonly used drop() function. You can find out how to use drop_duplicates() in my other post on de-duping data with Pandas.

The Pandas drop() method

First, we’ll look at the drop() method. This is the main Pandas function you’ll use for dropping columns and rows, either by name, index, or by condition. It’s really powerful.

The Pandas drop() method is used by appending the method to the name of a dataframe, i.e. df.drop() and can be used to delete or remove rows, columns or series from the original dataframe. There are some required arguments you must pass to the method. If you run df.drop() without these you’ll get back an error that says ValueError: Need to specify at least one of 'labels', 'index' or 'columns'.

Parameter	Description
`labels`	The `labels` parameter is used to specify the row or column labels of the DataFrame to be dropped. It takes either a single value or a list (or list-like).
`axis`	The `axis` parameter is used to specify the axis along which the labels are to be dropped. It takes either `0` for index or `1` for columns. The default value is `0`, so when you call `drop()` on its own it will drop via index.
`index`	The `index` parameter is used to specify the row labels of the DataFrame to be dropped. It takes either a single value or a list (or list-like). It's an alternative to specifying both `labels, axis=0` and does the same as `columns=labels`.
`columns`	The `columns` parameter is used to specify the column labels of the DataFrame to be dropped. It takes either a single value or a list (or list-like). It's an alternative to specifying both `labels, axis=1` and does the same as `columns=labels`.
`inplace`	The `inplace` parameter is used to specify whether the operation is to be performed on the DataFrame itself or a copy of it. It takes either `True` or `False`. The default value is `False`, so when you call `drop()` on its own it will return a copy of the DataFrame with the specified rows or columns dropped.

Import the packages

To get started working through these simple examples, first open a Jupyter notebook and import the Pandas and Numpy libraries.

import pandas as pd
import numpy as np

Create a Pandas dataframe

Next, either import data into a Pandas dataframe, or create a simple dataframe containing a few values to drop. If you use info() to print the details of your dataframe, you’ll see that we have a mixture of Pandas data types or dtypes in here, as well as some missing values, and some duplicates.

data = {'model': ['Defender 90', 'Defender 110', 'Defender 130'],
        'wheelbase': ['90 inches', '110 inches', '130 inches'], 
        'bhp': [140, 140, 140],
        'reliability': ['High', 'High', 'High'], 
        'desirability': ['High', 'High', 'High'], 
        'looks': ['High', 'Medium', np.NaN], 
        'price': [35000.99, 45999.99, 59999.99], 
        'year': [2016, 2016, 0]
       }
df_defenders = pd.DataFrame.from_dict(data)
df_defenders

	model	wheelbase	bhp	reliability	desirability	looks	price	year
0	Defender 90	90 inches	140	High	High	High	35000.99	2016
1	Defender 110	110 inches	140	High	High	Medium	45999.99	2016
2	Defender 130	130 inches	140	High	High	NaN	59999.99	0

df_defenders.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 8 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   model         3 non-null      object 
 1   wheelbase     3 non-null      object 
 2   bhp           3 non-null      int64  
 3   reliability   3 non-null      object 
 4   desirability  3 non-null      object 
 5   looks         2 non-null      object 
 6   price         3 non-null      float64
 7   year          3 non-null      int64  
dtypes: float64(1), int64(2), object(5)
memory usage: 320.0+ bytes

Drop a single column by name using labels

The labels parameter can be used to define which column or columns you want to drop. To drop a single column using labels we simply pass the column or series name as the first argument. We need to set axis=1 to tell drop() to use columns and not the index.

df = df_defenders.copy()
df = df.drop('bhp', axis=1)
df

	model	wheelbase	reliability	desirability	looks	price	year
0	Defender 90	90 inches	High	High	High	35000.99	2016
1	Defender 110	110 inches	High	High	Medium	45999.99	2016
2	Defender 130	130 inches	High	High	NaN	59999.99	0

While you don’t need to specify the name of labels in your call to drop, since it’s the default argument, it makes your code more readable and easier to maintain.

df = df_defenders.copy()
df = df.drop(labels='bhp', axis=1)
df

	model	wheelbase	reliability	desirability	looks	price	year
0	Defender 90	90 inches	High	High	High	35000.99	2016
1	Defender 110	110 inches	High	High	Medium	45999.99	2016
2	Defender 130	130 inches	High	High	NaN	59999.99	0

Drop multiple columns by name using labels

To drop multiple columns from the dataframe using labels you simply change the string value passed in the first argument to a list of column names, and again set axis=1 to tell the drop() method to use columns and not the index.

df = df_defenders.copy()
df = df.drop(labels=['bhp', 'reliability'], axis=1)
df

	model	wheelbase	desirability	looks	price	year
0	Defender 90	90 inches	High	High	35000.99	2016
1	Defender 110	110 inches	High	Medium	45999.99	2016
2	Defender 130	130 inches	High	NaN	59999.99	0

Drop a single row by index using labels

The labels method can also be used to drop a single specific row by passing in its index value. To do this, I’m passing in df.index[0] to remove the row that contains the model “Defender 90”.

df = df_defenders.copy()
df = df.drop(labels=df.index[0])
df

	model	wheelbase	bhp	reliability	desirability	looks	price	year
1	Defender 110	110 inches	140	High	High	Medium	45999.99	2016
2	Defender 130	130 inches	140	High	High	NaN	59999.99	0

Drop multiple rows by index using labels

To delete multiple rows from the dataframe using their index, we can pass a list of values to the first labels argument and set axis=0 to tell drop() to use the index and not the column name.

df = df_defenders.copy()
df = df.drop(labels=[0, 1], axis=0)
df

	model	wheelbase	bhp	reliability	desirability	looks	price	year
2	Defender 130	130 inches	140	High	High	NaN	59999.99	0

Using inplace to modify the original dataframe

The inplace parameter is used to modify the original dataframe and prevents the need to call the code and reassign the output back to the dataframe variable. For example, df.drop(labels='bhp', axis=1, inplace=True) is exactly the same as df = df.drop(labels='bhp', axis=1). The inplace parameter takes a True or False boolean value and defaults to False which returns a copy of the original dataframe with the chosen columns or rows removed, to which you can reassign back to the original dataframe variable or a new one.

df = df_defenders.copy()
df.drop(labels='bhp', axis=1, inplace=True)
df

	model	wheelbase	reliability	desirability	looks	price	year
0	Defender 90	90 inches	High	High	High	35000.99	2016
1	Defender 110	110 inches	High	High	Medium	45999.99	2016
2	Defender 130	130 inches	High	High	NaN	59999.99	0

Drop a single column by name using columns

The columns parameter is the one you’ll see data scientists use most often, because it’s the most concise. At first glance, it might look like it does the same thing as labels, and it kind of does, but it’s not quite the same. Since the labels method requires you to set axis=1 to tell the drop() method you’re referring to columns not indices, you need to enter df1 = df1.drop('bhp', axis=1) to drop a single column, but with columns you can reduce this to df = df.drop(columns='bhp').

df = df_defenders.copy()
df = df.drop(columns='bhp')
df

	model	wheelbase	reliability	desirability	looks	price	year
0	Defender 90	90 inches	High	High	High	35000.99	2016
1	Defender 110	110 inches	High	High	Medium	45999.99	2016
2	Defender 130	130 inches	High	High	NaN	59999.99	0

Drop multiple columns by name using columns

As with labels, you can also pass in a list of column names to drop multiple columns at once. However, there’s no need to provide axis=1, as Pandas knows you’re dropping columns by name.

df = df_defenders.copy()
df = df.drop(columns=['bhp', 'reliability'])
df

	model	wheelbase	desirability	looks	price	year
0	Defender 90	90 inches	High	High	35000.99	2016
1	Defender 110	110 inches	High	Medium	45999.99	2016
2	Defender 130	130 inches	High	NaN	59999.99	0

Drop columns by dtype using select_dtypes()

The Pandas select_dtypes() function is also compatible with the drop() function and can be used to drop columns based on their Pandas dtype or data type. To use select_dtypes() to drop Pandas columns according to their dtype you need to pass the function to the columns parameter of drop with the include argument set to the dtype you wish to include, and then append the Pandas columns method to the end.

Drop columns containing int

For example, df.drop(columns=df.select_dtypes(include='int').columns) will use select_dtypes() to select all columns that have an int or integer dtype and then return a list of their column names to the columns parameter of drop().

df = df_defenders.copy()
df = df.drop(columns=df.select_dtypes(include='int').columns)
df

	model	wheelbase	reliability	desirability	looks	price
0	Defender 90	90 inches	High	High	High	35000.99
1	Defender 110	110 inches	High	High	Medium	45999.99
2	Defender 130	130 inches	High	High	NaN	59999.99

Drop columns containing float

You can, of course, substitute the int dtype with any Pandas dtype. For example, float for floating point or decimal numbers, or object for strings.

df = df_defenders.copy()
df = df.drop(columns=df.select_dtypes(include='float').columns)
df

	model	wheelbase	bhp	reliability	desirability	looks	year
0	Defender 90	90 inches	140	High	High	High	2016
1	Defender 110	110 inches	140	High	High	Medium	2016
2	Defender 130	130 inches	140	High	High	NaN	0

Drop columns containing object or string data

To drop the columns that contain strings or object dtype values you can pass include='object' to select_dtypes().

df = df_defenders.copy()
df = df.drop(columns=df.select_dtypes(include='object').columns)
df

	bhp	price	year
0	140	35000.99	2016
1	140	45999.99	2016
2	140	59999.99	0

Drop columns containing numeric data using np.number

If you want to drop all columns that contain numeric data you can use Numpy’s np.number method to return all numeric data and then pass that to select_dtypes() and pass that to drop().

df = df_defenders.copy()
df = df.drop(columns=df.select_dtypes(include=np.number).columns)
df

	model	wheelbase	reliability	desirability	looks
0	Defender 90	90 inches	High	High	High
1	Defender 110	110 inches	High	High	Medium
2	Defender 130	130 inches	High	High	NaN

Drop columns containing non-numeric data using np.number

To drop all non-numeric columns and return only the numeric columns, you can use the exclude argument when calling select_dtypes(). This gives us a dataframe containing only the float or int dtypes, with the object columns dropped.

df = df_defenders.copy()
df = df.drop(columns=df.select_dtypes(exclude=np.number).columns)
df

	bhp	price	year
0	140	35000.99	2016
1	140	45999.99	2016
2	140	59999.99	0

Drop columns where a substring exists in a column name

Pandas also lets you drop columns by condition. There are loads of ways to do this. First, we’ll look at dropping column names that contain a given substring in the column name. We can do this by creating a for loop and looping over the column names and return the name if they contain the substring, then return the list of columns containing the substring to drop() to remove them.

df = df_defenders.copy()
df = df.drop(columns=[col for col in df.columns if 'y' in col])
df

	model	wheelbase	bhp	looks	price
0	Defender 90	90 inches	140	High	35000.99
1	Defender 110	110 inches	140	Medium	45999.99
2	Defender 130	130 inches	140	NaN	59999.99

Drop columns where a column value contains a string

You can also drop Pandas columns by condition where the column value contains a string or substring. This works just like the code above, but calls df[col].values instead.

df = df_defenders.copy()
df = df.drop(columns=[col for col in df.columns if 'High' in df[col].values])
df

<ipython-input-18-1e45f1ff6166>:2: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
  df = df.drop(columns=[col for col in df.columns if 'High' in df[col].values])

	model	wheelbase	bhp	price	year
0	Defender 90	90 inches	140	35000.99	2016
1	Defender 110	110 inches	140	45999.99	2016
2	Defender 130	130 inches	140	59999.99	0

Drop rows where a column value contains a string

You can also drop Pandas dataframe rows by condition where a column value contains a string. To do this, you loop over the rows, look for the substring in the row values, and return the list of matches to the index argument of drop() and the matching rows will be removed.

df = df_defenders.copy()
df = df.drop(index=[row for row in df.index if 'Medium' in df.loc[row].values])
df

	model	wheelbase	bhp	reliability	desirability	looks	price	year
0	Defender 90	90 inches	140	High	High	High	35000.99	2016
2	Defender 130	130 inches	140	High	High	NaN	59999.99	0

Drop rows where a value is less than a specific value

If you’re working with numeric data, such as int or float dtypes, you can drop rows where a column value is greater than a target value by looping over the rows and returning the index of those matching rows to index when calling drop().

df = df_defenders.copy()
df = df.drop(index=[row for row in df.index if 50000 > df.loc[row, 'price']])
df

	model	wheelbase	bhp	reliability	desirability	looks	price	year
2	Defender 130	130 inches	140	High	High	NaN	59999.99	0

Drop rows where a value is greater than a specific value

The same approach can be used to drop rows where the column value is greater than a given value.

df = df_defenders.copy()
df = df.drop(index=[row for row in df.index if 50000 < df.loc[row, 'price']])
df

	model	wheelbase	bhp	reliability	desirability	looks	price	year
0	Defender 90	90 inches	140	High	High	High	35000.99	2016
1	Defender 110	110 inches	140	High	High	Medium	45999.99	2016

Drop rows where a value is equal to a specific value

To drop dataframe rows where a column value is equal to a specific value you can also use the for loop approach to return the index of matching rows to index when calling drop().

df = df_defenders.copy()
df = df.drop(index=[row for row in df.index if 45999.99 == df.loc[row, 'price']])
df

	model	wheelbase	bhp	reliability	desirability	looks	price	year
0	Defender 90	90 inches	140	High	High	High	35000.99	2016
2	Defender 130	130 inches	140	High	High	NaN	59999.99	0

Drop rows where a value is less than or equal to a specific value

To drop rows where a value is less than or equal to a specific value you use the <= operator.

df = df_defenders.copy()
df = df.drop(index=[row for row in df.index if 45999.99 <= df.loc[row, 'price']])
df

	model	wheelbase	bhp	reliability	desirability	looks	price	year
0	Defender 90	90 inches	140	High	High	High	35000.99	2016

Drop rows where a value is greater than or equal to a threshold

To drop rows where a value is greater than or equal to a threshold value you use the >= operator.

df = df_defenders.copy()
df = df.drop(index=[row for row in df.index if 45999.99 >= df.loc[row, 'price']])
df

	model	wheelbase	bhp	reliability	desirability	looks	price	year
2	Defender 130	130 inches	140	High	High	NaN	59999.99	0

Drop rows where a value is not equal to a specific value

Similarly, to drop rows where a column value is not equal to a given value you can simply switch from the == operator to the != operator, for “not equals”.

df = df_defenders.copy()
df = df.drop(index=[row for row in df.index if 45999.99 != df.loc[row, 'price']])
df

	model	wheelbase	bhp	reliability	desirability	looks	price	year
1	Defender 110	110 inches	140	High	High	Medium	45999.99	2016

Drop rows where a value falls between two values

To drop those rows where a column value falls between two numeric values you can call the values separately, for example row for row in df.index if 35000 < df.loc[row, 'price'] < 50000 and then pass the indices of the matching rows to the drop() function’s index parameter.

df = df_defenders.copy()
df = df.drop(index=[row for row in df.index if 35000 < df.loc[row, 'price'] < 50000])
df

	model	wheelbase	bhp	reliability	desirability	looks	price	year
2	Defender 130	130 inches	140	High	High	NaN	59999.99	0

Drop rows with 0 in any column

If you want to drop any columns that contain a zero you can use col for col in df.columns if 0 in df[col].values.

df = df_defenders.copy()
df = df.drop(columns=[col for col in df.columns if 0 in df[col].values])
df

	model	wheelbase	bhp	reliability	desirability	looks	price
0	Defender 90	90 inches	140	High	High	High	35000.99
1	Defender 110	110 inches	140	High	High	Medium	45999.99
2	Defender 130	130 inches	140	High	High	NaN	59999.99

Drop rows with a 0 in any row value

To drop any rows that contain a zero in the row values you can use row for row in df.index if 0 in df.loc[row].values.

df = df_defenders.copy()
df = df.drop(index=[row for row in df.index if 0 in df.loc[row].values])
df

	model	wheelbase	bhp	reliability	desirability	looks	price	year
0	Defender 90	90 inches	140	High	High	High	35000.99	2016
1	Defender 110	110 inches	140	High	High	Medium	45999.99	2016

Drop columns not in list

A commonly used snippet in machine learning is to drop all columns except those in a list. This can be done a couple of ways, but the not in for loop is an easy one.

df = df_defenders.copy()
df = df.drop(columns=[col for col in df.columns if col not in ['model', 'price']])
df

	model	price
0	Defender 90	35000.99
1	Defender 110	45999.99
2	Defender 130	59999.99

Matt Clarke, Thursday, November 24, 2022

Matt Clarke Matt is an Ecommerce and Marketing Director who uses data science to help in his work. Matt has a Master's degree in Internet Retailing (plus two other Master's degrees in different fields) and specialises in the technical side of ecommerce and marketing.