How to use the Pandas query() function

Learn how to use the Pandas query() function to search, filter, or subset a dataframe to show only those rows that match your search criteria.

How to use the Pandas query() function
Picture by Ciao, Pexels.
13 minutes to read

The Pandas query() function is an awesome tool for filtering Pandas dataframes. It takes simple string arguments on column names and uses standard Pandas operators that allow you to easily search, filter, or subset a dataframe, so it only shows rows that match your search criteria.

In this simple tutorial I’ll show you how easy it is to use the query() function to filter a dataframe by working through some basic example codes highlighting its key features.

Create a Pandas dataframe

To get started, open a Jupyter notebook and import the Pandas library, then either import your own dataset, or create a Pandas dataframe to practice querying. We’ll make a simple dataframe that contains a mixture of categorical and numerical variables.

import pandas as pd

df = pd.DataFrame(
    [('Pterophyllum altum', 'Cichlidae', 15, 12.3),
        ('Pterophyllum scalare', 'Cichlidae', 10, 9.2),
        ('Pterophyllum leopoldi', 'Cichlidae', 8, 6.5), 
        ('Corydoras paleatus', 'Callichthyidae', 5, 0.5),
        ('Corydoras aeneus', 'Callichthyidae', 7, 2.5),
        ('Corydoras trilineatus', 'Callichthyidae', 5, 1.5),
        ('Corydoras habrosus', 'Callichthyidae', 3, 0.5),
    ],
    columns=['species', 'family', 'length', 'weight']
)
df
species family length weight
0 Pterophyllum altum Cichlidae 15 12.3
1 Pterophyllum scalare Cichlidae 10 9.2
2 Pterophyllum leopoldi Cichlidae 8 6.5
3 Corydoras paleatus Callichthyidae 5 0.5
4 Corydoras aeneus Callichthyidae 7 2.5
5 Corydoras trilineatus Callichthyidae 5 1.5
6 Corydoras habrosus Callichthyidae 3 0.5

Find all rows matching a column value

First, we’ll use query() to filter the dataframe to return only the rows where the species column contains a string that exactly matches == the string “Corydoras aeneus”. This returns a new dataframe, so you can assign it to a new variable if you need to, or chain other commands to it using method chaining.

df.query('species == "Corydoras aeneus"')
species family length weight
4 Corydoras aeneus Callichthyidae 7 2.5

Find all rows that do not match a column value

To use query() to filter the dataframe so it shows only those rows where the species column does not match a string we use the != or “not equals” operator.

df.query('family != "Cichlidae"')
species family length weight
3 Corydoras paleatus Callichthyidae 5 0.5
4 Corydoras aeneus Callichthyidae 7 2.5
5 Corydoras trilineatus Callichthyidae 5 1.5
6 Corydoras habrosus Callichthyidae 3 0.5

Find all rows where a column value is in a list

Next, we’ll use query() to filter the dataframe to return only those rows where the species column contains a value that matches a value from a list. We can do this in two ways. For shorter lists, you can pass the list directly into the query() command, but for longer lists you can bring in an externally defined list by prefixing the variable name with @.

df.query('species in ["Corydoras paleatus", "Corydoras aeneus"]')
species family length weight
3 Corydoras paleatus Callichthyidae 5 0.5
4 Corydoras aeneus Callichthyidae 7 2.5
targets = ['Corydoras trilineatus', 'Corydoras aeneus']
df.query('species in @targets')
species family length weight
4 Corydoras aeneus Callichthyidae 7 2.5
5 Corydoras trilineatus Callichthyidae 5 1.5

Find all rows where a column value is not in a list

To return only those rows where a column value is not in a list we need to use the not in operator instead of the in operator. As with before, you can either define the list within the query() function or refer to the variable using an @ prefix on the list variable name.

df.query('species not in ["Corydoras paleatus", "Corydoras aeneus"]')
species family length weight
0 Pterophyllum altum Cichlidae 15 12.3
1 Pterophyllum scalare Cichlidae 10 9.2
2 Pterophyllum leopoldi Cichlidae 8 6.5
5 Corydoras trilineatus Callichthyidae 5 1.5
6 Corydoras habrosus Callichthyidae 3 0.5
targets = ['Corydoras trilineatus', 'Corydoras aeneus']
df.query('species not in @targets')
species family length weight
0 Pterophyllum altum Cichlidae 15 12.3
1 Pterophyllum scalare Cichlidae 10 9.2
2 Pterophyllum leopoldi Cichlidae 8 6.5
3 Corydoras paleatus Callichthyidae 5 0.5
6 Corydoras habrosus Callichthyidae 3 0.5

Find all rows that have a value greater than x

The query() function also supports the Pandas operators that work with numeric data. For example, to filter the dataframe based on column values that are less than x we can use df.query('weight > 9').

df.query('weight > 9')
species family length weight
0 Pterophyllum altum Cichlidae 15 12.3
1 Pterophyllum scalare Cichlidae 10 9.2

Find all rows that have a value less than x

Similarly, we can filter the dataframe to show only those rows where a value is less than x by using the less than operator.

df.query('weight < 9')
species family length weight
2 Pterophyllum leopoldi Cichlidae 8 6.5
3 Corydoras paleatus Callichthyidae 5 0.5
4 Corydoras aeneus Callichthyidae 7 2.5
5 Corydoras trilineatus Callichthyidae 5 1.5
6 Corydoras habrosus Callichthyidae 3 0.5

Find rows greater than or equal to a value

To return only those rows where values in the weight column are greater than or equal to a given value we need to use the >= operator.

df.query('weight >= 9.3')
species family length weight
0 Pterophyllum altum Cichlidae 15 12.3

Find rows less than or equal to a value

To return the rows that contain a value less than or equal to a given value we need to use the <= operator.

df.query('weight <= 1.5')
species family length weight
3 Corydoras paleatus Callichthyidae 5 0.5
5 Corydoras trilineatus Callichthyidae 5 1.5
6 Corydoras habrosus Callichthyidae 3 0.5

Find rows that have a value between two values

To return rows that have a value that is between two values, such as between 5 and 12, we can use the and operator to separate two separate queries.

df.query('weight > 5 and weight < 12')
species family length weight
1 Pterophyllum scalare Cichlidae 10 9.2
2 Pterophyllum leopoldi Cichlidae 8 6.5

Find rows that match multiple conditions

Finally, to find rows that match multiple conditions, you can mix and match the various queries to construct a more complex search pattern using one or more and operators to separate the individual queries.

df.query('family == "Cichlidae" and length > 3')
species family length weight
0 Pterophyllum altum Cichlidae 15 12.3
1 Pterophyllum scalare Cichlidae 10 9.2
2 Pterophyllum leopoldi Cichlidae 8 6.5
df.query('family == "Cichlidae" and length > 3 and weight < 10')
species family length weight
1 Pterophyllum scalare Cichlidae 10 9.2
2 Pterophyllum leopoldi Cichlidae 8 6.5

Matt Clarke, Monday, January 09, 2023

Matt Clarke Matt is an Ecommerce and Marketing Director who uses data science to help in his work. Matt has a Master's degree in Internet Retailing (plus two other Master's degrees in different fields) and specialises in the technical side of ecommerce and marketing.