How to use isna() to check for missing values in a Pandas dataframe

The Pandas isna() function is used to check for missing values in dataframe columns. Here's how to use it.

How to use isna() to check for missing values in a Pandas dataframe
Picture by Pixabay, Pexels.
10 minutes to read

Real world data is rarely clean, and you’ll often encounter missing values when working with Pandas dataframes. Missing values can lead to errors in your code, and can cause models to fail or be inaccurate, so it’s vital that you check for them.

In this project, I’ll show you how you can use the Pandas isna() function to check for missing values in your dataframe, and in specific columns, and how to create a helper function that determines the percentage of missing values overall, so you can decide on the best approach to dealing with them.

Import the packages

To get started, open a Jupyter notebook and import the Pandas package. You’ll likely already have Pandas installed, but if you don’t you can install it by entering !pip3 install pandas in a Jupyter cell or pip3 install pandas in your terminal.

import pandas as pd

Load the data

Next, we’ll load a dataset into Pandas that includes some missing values. I’ve used a dataset on food consumption from OpenMV, but you can use any data you like.

If you run df.info() on the dataframe you’ll see that it contains an object column containing the country name, and then a load of numeric data columns containing information on food consumption.

df = pd.read_csv('food-consumption.csv')
df.head()
Country Real coffee Instant coffee Tea Sweetener Biscuits Powder soup Tin soup Potatoes Frozen fish ... Apples Oranges Tinned fruit Jam Garlic Butter Margarine Olive oil Yoghurt Crisp bread
0 Germany 90 49 88 19.0 57.0 51 19 21 27 ... 81 75 44 71 22 91 85 74 30.0 26
1 Italy 82 10 60 2.0 55.0 41 3 2 4 ... 67 71 9 46 80 66 24 94 5.0 18
2 France 88 42 63 4.0 76.0 53 11 23 11 ... 87 84 40 45 88 94 47 36 57.0 3
3 Holland 96 62 98 32.0 62.0 67 43 7 14 ... 83 89 61 81 15 31 97 13 53.0 15
4 Belgium 94 38 48 11.0 74.0 37 23 9 13 ... 76 76 42 57 29 84 80 83 20.0 5

5 rows × 21 columns

df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 16 entries, 0 to 15
Data columns (total 21 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   Country         16 non-null     object 
 1   Real coffee     16 non-null     int64  
 2   Instant coffee  16 non-null     int64  
 3   Tea             16 non-null     int64  
 4   Sweetener       15 non-null     float64
 5   Biscuits        15 non-null     float64
 6   Powder soup     16 non-null     int64  
 7   Tin soup        16 non-null     int64  
 8   Potatoes        16 non-null     int64  
 9   Frozen fish     16 non-null     int64  
 10  Frozen veggies  16 non-null     int64  
 11  Apples          16 non-null     int64  
 12  Oranges         16 non-null     int64  
 13  Tinned fruit    16 non-null     int64  
 14  Jam             16 non-null     int64  
 15  Garlic          16 non-null     int64  
 16  Butter          16 non-null     int64  
 17  Margarine       16 non-null     int64  
 18  Olive oil       16 non-null     int64  
 19  Yoghurt         15 non-null     float64
 20  Crisp bread     16 non-null     int64  
dtypes: float64(3), int64(17), object(1)
memory usage: 2.8+ KB

Check for missing values

The easiest way to check for missing values in a Pandas dataframe is via the isna() function. The isna() function returns a boolean (True or False) value if the Pandas column value is missing, so if you run df.isna() you’ll get back a dataframe showing you a load of boolean values.

df.isna().head()
Country Real coffee Instant coffee Tea Sweetener Biscuits Powder soup Tin soup Potatoes Frozen fish ... Apples Oranges Tinned fruit Jam Garlic Butter Margarine Olive oil Yoghurt Crisp bread
0 False False False False False False False False False False ... False False False False False False False False False False
1 False False False False False False False False False False ... False False False False False False False False False False
2 False False False False False False False False False False ... False False False False False False False False False False
3 False False False False False False False False False False ... False False False False False False False False False False
4 False False False False False False False False False False ... False False False False False False False False False False

5 rows × 21 columns

That’s not usually very useful, so instead we’ll calculate the sum() of missing values by running df.isna().sum(). This returns the columns in our Pandas dataframe along with the number of missing values detected in each one, so 0 means there are no missing values, and 1 means there is a single missing value.

df.isna().sum()
Country           0
Real coffee       0
Instant coffee    0
Tea               0
Sweetener         1
Biscuits          1
Powder soup       0
Tin soup          0
Potatoes          0
Frozen fish       0
Frozen veggies    0
Apples            0
Oranges           0
Tinned fruit      0
Jam               0
Garlic            0
Butter            0
Margarine         0
Olive oil         0
Yoghurt           1
Crisp bread       0
dtype: int64

You can also run this code on a single Pandas column, or series, or a subset of columns, if you don’t need to check the entire dataframe.

df['Sweetener'].isna().sum()
1
df[['Sweetener', 'Frozen fish']].isna().sum()
Sweetener      1
Frozen fish    0
dtype: int64

Create a helper function to summarise missing data

To take this a step further, we can create a reusable helper function you can use in your projects. We’ll call this show_missing() and we’ll loop through each column in the dataframe and return some useful data on each one.

We’ll detect the Pandas dtypes, count the number of values, count the number of unique values, count the number of missing values, and calculate the percentage of values that are missing overall.

def show_missing(df):
    """Return a Pandas dataframe describing the contents of a source dataframe including missing values."""
    
    variables = []
    dtypes = []
    count = []
    unique = []
    missing = []
    pc_missing = []
    
    for item in df.columns:
        variables.append(item)
        dtypes.append(df[item].dtype)
        count.append(len(df[item]))
        unique.append(len(df[item].unique()))
        missing.append(df[item].isna().sum())
        pc_missing.append(round((df[item].isna().sum() / len(df[item])) * 100, 2))

    output = pd.DataFrame({
        'variable': variables, 
        'dtype': dtypes,
        'count': count,
        'unique': unique,
        'missing': missing, 
        'pc_missing': pc_missing
    })    
        
    return output
show_missing(df)
variable dtype count unique missing pc_missing
0 Country object 16 16 0 0.00
1 Real coffee int64 16 14 0 0.00
2 Instant coffee int64 16 15 0 0.00
3 Tea int64 16 15 0 0.00
4 Sweetener float64 16 14 1 6.25
5 Biscuits float64 16 14 1 6.25
6 Powder soup int64 16 14 0 0.00
7 Tin soup int64 16 13 0 0.00
8 Potatoes int64 16 11 0 0.00
9 Frozen fish int64 16 15 0 0.00
10 Frozen veggies int64 16 13 0 0.00
11 Apples int64 16 14 0 0.00
12 Oranges int64 16 15 0 0.00
13 Tinned fruit int64 16 15 0 0.00
14 Jam int64 16 16 0 0.00
15 Garlic int64 16 13 0 0.00
16 Butter int64 16 15 0 0.00
17 Margarine int64 16 13 0 0.00
18 Olive oil int64 16 15 0 0.00
19 Yoghurt float64 16 14 1 6.25
20 Crisp bread int64 16 15 0 0.00

Dealing with missing values is a subject in its own right. There are various approaches you can use which range from imputing missing values to dropping them entirely.

Matt Clarke, Friday, September 02, 2022

Matt Clarke Matt is an Ecommerce and Marketing Director who uses data science to help in his work. Matt has a Master's degree in Internet Retailing (plus two other Master's degrees in different fields) and specialises in the technical side of ecommerce and marketing.