How to find the most common value in a Pandas dataframe column

Picture by Monoar Rahman, Pexels.

6 minutes to read

Data Science Pandas

When working with categorical data in Pandas dataframes, it can help to get an understanding of the number of times a given value appears - a feature called “cardinality.” The Pandas value_counts() function is ideal for this.

The value_counts() function returns an object containing the name of each categorical variable and the number of times it occurs within the column. However, what if you want to extract the most common value itself, rather than get a count of the frequency distribution?

As you might expect, Pandas includes a variety of functions that can be used to determine the most common value in a dataframe column. In this quick tutorial, we’ll go over some code examples that show you how to find the most common value using value_counts(), mode(), idxmax(), and nlargest().

Load Pandas and import the data

To get started, open a Jupyter notebook and import some data into a Pandas dataframe. I’ve created a CSV file of Google Analytics data that you can use if you don’t have a suitable dataset of your own.

import pandas as pd

df = pd.read_csv('https://raw.githubusercontent.com/flyandlure/datasets/master/google-analytics.csv')
df.head()

Use value_counts() to examine the data

First, we’ll use the Pandas value_counts() function to examine the distribution of the data in the Browser column. This contains a categorical variable containing the name of the web browser used to access the site.

To use value_counts(), you simply append the function to the Pandas column name. Optionally, you can also append the to_frame() method to convert the data to a dataframe. Running the function shows us that Chrome is the most widely used browser, with 6869 occurrences in the Browser column.

df['Browser'].value_counts().to_frame()

	Browser
Chrome	6869
Safari	1379
Edge	817
Samsung Internet	321
Amazon Silk	216
Firefox	177
Internet Explorer	130
Android Webview	45
Android Browser	16
Opera	10
Safari (in-app)	10
Opera Mini	3
Playstation 4	2
awin.com - site screen shotter	2
UC Browser	1
Mozilla Compatible Agent	1
Iron	1

Find the most common values in a column with idxmax()

Next, we’ll extract only the most common value in the column by using value_counts() and idxmax() together. The value_counts() function counts the number of times each value appears, and the idxmax() function returns the index of the row with the highest value.

df['Browser'].value_counts().idxmax()

'Chrome'

Find the most common values in a column with mode()

We can also find the most common value in a Pandas dataframe column using the mode() function. You can run the mode() function on an entire dataframe using df.mode() and it will return the most common value in each column.

df.mode()

	User Type	Source	Medium	Browser	Device Category	Date	Pageviews
0	New Visitor	google	organic	Chrome	desktop	2020-08-03	1

You can also use the mode function to get the most common value in a column. For example, df['Browser'].mode() will return the most common browser. Since this returns an object, instead of the actual value, you need to append [0] to the end to extract the value itself.

df['Browser'].mode()

0    Chrome
dtype: object

df['Browser'].mode()[0]

'Chrome'

Find the most common values in a column with nlargest()

Finally, there’s nlargest(). The nlargest() function, as the name suggests, returns the n (this means any number) largest values in the column, so when you use it with value_counts() it can return the most common value based on their number of occurrences.

For example, df['Browser'].value_counts().nlargest(3) will return the three most commonly seen browsers and the number of occurrences.

df['Browser'].value_counts().nlargest(3)

Chrome    6869
Safari    1379
Edge       817
Name: Browser, dtype: int64

Since the browser name is stored in index[0] and the number of occurrences is stored in values[0], we can use df['Browser'].value_counts().nlargest(1).index[0] to get the most common browser name.

df['Browser'].value_counts().nlargest(1).index[0]

'Chrome'

Matt Clarke, Saturday, November 26, 2022

Matt Clarke Matt is an Ecommerce and Marketing Director who uses data science to help in his work. Matt has a Master's degree in Internet Retailing (plus two other Master's degrees in different fields) and specialises in the technical side of ecommerce and marketing.