When working with Pandas, you’ll often need to identify and count unique values in a DataFrame. This is a common task in data science, and Pandas provides two methods to help you do this: unique()
and nunique()
. In this quick tutorial, you’ll learn how to use these methods to identify and count unique values in a Pandas DataFrame.
To get started, open a Jupyter notebook and import the Pandas package.
import pandas as pd
Next, either import data into a Pandas dataframe containing the data you want to examine, or create a new dataframe containing some duplicate values.
data = [{'species': 'Esox lucius', 'length': 120, 'weight': 8.1, 'age': 3},
{'species': 'Esox lucius', 'length': 100, 'weight': 7.7, 'age': 2},
{'species': 'Esox lucius', 'length': 110, 'weight': 7.9, 'age': 2},
{'species': 'Cyprinus carpio', 'length': 56, 'weight': 8.3, 'age': 13},
{'species': 'Cyprinus carpio', 'length': 36, 'weight': 7.9, 'age': 23},
{'species': 'Cyprinus carpio', 'length': 46, 'weight': 8.1, 'age': 13},
{'species': 'Salmo trutta', 'length': 40, 'weight': 7.5, 'age': 5},
{'species': 'Salmo trutta', 'length': 38, 'weight': 7.4, 'age': 4},
{'species': 'Oncorhynchus mykiss', 'length': 42, 'weight': 7.6, 'age': 5},
{'species': 'Salmo salar', 'length': 44, 'weight': 7.7, 'age': 5}]
df = pd.DataFrame(data)
df
species | length | weight | age | |
---|---|---|---|---|
0 | Esox lucius | 120 | 8.1 | 3 |
1 | Esox lucius | 100 | 7.7 | 2 |
2 | Esox lucius | 110 | 7.9 | 2 |
3 | Cyprinus carpio | 56 | 8.3 | 13 |
4 | Cyprinus carpio | 36 | 7.9 | 23 |
5 | Cyprinus carpio | 46 | 8.1 | 13 |
6 | Salmo trutta | 40 | 7.5 | 5 |
7 | Salmo trutta | 38 | 7.4 | 4 |
8 | Oncorhynchus mykiss | 42 | 7.6 | 5 |
9 | Salmo salar | 44 | 7.7 | 5 |
To select the unique values from a specific column in a Pandas dataframe you can use the unique()
method. This is simply appended to the end of the column name, e.g. df['column_name'].unique()
and returns a Python list of the unique values.
# Select unique values from the species column
df['species'].unique()
array(['Esox lucius', 'Cyprinus carpio', 'Salmo trutta',
'Oncorhynchus mykiss', 'Salmo salar'], dtype=object)
To count the number of unique values in a specific column in a Pandas dataframe you can use the nunique()
method. As with the unique()
method, this is simply appended to the end of the column name, e.g. df['column_name'].nunique()
and returns an integer representing the number of unique values.
# Count the number of unique values in the species column
df['species'].nunique()
5
Matt Clarke, Saturday, November 12, 2022