How to identify and count unique values in Pandas

Learn how to use the Pandas unique() and nunique() methods to identify and count unique values in a Pandas DataFrame.

How to identify and count unique values in Pandas
Picture by Tim Gouw, Pexels.
4 minutes to read

When working with Pandas, you’ll often need to identify and count unique values in a DataFrame. This is a common task in data science, and Pandas provides two methods to help you do this: unique() and nunique(). In this quick tutorial, you’ll learn how to use these methods to identify and count unique values in a Pandas DataFrame.

Import the packages

To get started, open a Jupyter notebook and import the Pandas package.

import pandas as pd

Create a dataframe containing some duplicate values

Next, either import data into a Pandas dataframe containing the data you want to examine, or create a new dataframe containing some duplicate values.

data = [{'species': 'Esox lucius', 'length': 120, 'weight': 8.1, 'age': 3}, 
        {'species': 'Esox lucius', 'length': 100, 'weight': 7.7, 'age': 2},
        {'species': 'Esox lucius', 'length': 110, 'weight': 7.9, 'age': 2},
        {'species': 'Cyprinus carpio', 'length': 56, 'weight': 8.3, 'age': 13},
        {'species': 'Cyprinus carpio', 'length': 36, 'weight': 7.9, 'age': 23},
        {'species': 'Cyprinus carpio', 'length': 46, 'weight': 8.1, 'age': 13},
        {'species': 'Salmo trutta', 'length': 40, 'weight': 7.5, 'age': 5},
        {'species': 'Salmo trutta', 'length': 38, 'weight': 7.4, 'age': 4},
        {'species': 'Oncorhynchus mykiss', 'length': 42, 'weight': 7.6, 'age': 5},
        {'species': 'Salmo salar', 'length': 44, 'weight': 7.7, 'age': 5}]

df = pd.DataFrame(data)
species length weight age
0 Esox lucius 120 8.1 3
1 Esox lucius 100 7.7 2
2 Esox lucius 110 7.9 2
3 Cyprinus carpio 56 8.3 13
4 Cyprinus carpio 36 7.9 23
5 Cyprinus carpio 46 8.1 13
6 Salmo trutta 40 7.5 5
7 Salmo trutta 38 7.4 4
8 Oncorhynchus mykiss 42 7.6 5
9 Salmo salar 44 7.7 5

Select unique values from a specific column

To select the unique values from a specific column in a Pandas dataframe you can use the unique() method. This is simply appended to the end of the column name, e.g. df['column_name'].unique() and returns a Python list of the unique values.

# Select unique values from the species column
array(['Esox lucius', 'Cyprinus carpio', 'Salmo trutta',
       'Oncorhynchus mykiss', 'Salmo salar'], dtype=object)

Count the number of unique values in a specific column

To count the number of unique values in a specific column in a Pandas dataframe you can use the nunique() method. As with the unique() method, this is simply appended to the end of the column name, e.g. df['column_name'].nunique() and returns an integer representing the number of unique values.

# Count the number of unique values in the species column

Matt Clarke, Saturday, November 12, 2022

Matt Clarke Matt is an Ecommerce and Marketing Director who uses data science to help in his work. Matt has a Master's degree in Internet Retailing (plus two other Master's degrees in different fields) and specialises in the technical side of ecommerce and marketing.