How to use the Pandas value_counts() function

The Pandas value_counts() function is great for calculating the number of occurrences of a value in a column and for calculating frequency distributions.

How to use the Pandas value_counts() function
Picture by Marcus Spiske, Unsplash.
6 minutes to read

The Pandas value_counts() function can be used to count the number of times a value occurs within a dataframe column or series, as well as calculating frequency distributions. Here’s a quick guide to using it in your Exploratory Data Analysis.

Load the data

Open a Jupyter notebook, import pandas as pd and then use the read_csv() function to load up some data. I’m using a Google Analytics dataset containing some traffic from one of my other sites.

import pandas as pd
df = pd.read_csv('https://raw.githubusercontent.com/flyandlure/datasets/master/google-analytics.csv')
df.head()
User Type Source Medium Browser Device Category Date Pageviews
0 New Visitor (direct) (none) Amazon Silk mobile 2020-07-31 3
1 New Visitor (direct) (none) Amazon Silk mobile 2020-07-14 1
2 New Visitor (direct) (none) Amazon Silk tablet 2020-07-14 1
3 New Visitor (direct) (none) Amazon Silk tablet 2020-08-07 1
4 New Visitor (direct) (none) Amazon Silk tablet 2020-08-12 1

Count the number of unique values in a column

The simplest way to use value_counts() is to append the function to a Pandas dataframe column. This will count the number of times each value occurs within the whole column and return the data in a series. By default, the data are returned in descending order.

df['Browser'].value_counts()
Chrome                            6869
Safari                            1379
Edge                               817
Samsung Internet                   321
Amazon Silk                        216
Firefox                            177
Internet Explorer                  130
Android Webview                     45
Android Browser                     16
Opera                               10
Safari (in-app)                     10
Opera Mini                           3
awin.com - site screen shotter       2
Playstation 4                        2
UC Browser                           1
Mozilla Compatible Agent             1
Iron                                 1
Name: Browser, dtype: int64

If you want to display the data in ascending order of the number of unique values encountered, you can add the ascending=True parameter to the brackets. To display the output in a Pandas dataframe, simply append .to_frame() to the end of your code.

df['Browser'].value_counts(ascending=True).to_frame()
Browser
Iron 1
Mozilla Compatible Agent 1
UC Browser 1
Playstation 4 2
awin.com - site screen shotter 2
Opera Mini 3
Safari (in-app) 10
Opera 10
Android Browser 16
Android Webview 45
Internet Explorer 130
Firefox 177
Amazon Silk 216
Samsung Internet 321
Edge 817
Safari 1379
Chrome 6869

Get the frequency distribution of each column value

Another really neat trick you can do is calculate a frequency distribution. By adding the normalize=True argument, you get back the percentage of the total values that are made up by each row.

df['Device Category'].value_counts(normalize=True)
desktop    0.4882
mobile     0.4565
tablet     0.0553
Name: Device Category, dtype: float64

Binning continuous variables using their values

Another useful argument is bins. This will count the unique values in the column and then bin them into the number of bins defined in the argument.

df['Pageviews'].value_counts(bins=5)
(0.986, 3.6]    9579
(3.6, 6.2]       371
(6.2, 8.8]        32
(8.8, 11.4]       15
(11.4, 14.0]       3
Name: Pageviews, dtype: int64

Counting only values meeting a certain condition

If you want to count the unique values that make a certain condition, you can append a simple lambda function. In the example below we’ll count the number of times each Browser appears when there are more than 100 occurrences in the column, and then do the same for browsers with fewer than 10 occurrences.

df['Browser'].value_counts().loc[lambda x : x > 100]
Chrome               6869
Safari               1379
Edge                  817
Samsung Internet      321
Amazon Silk           216
Firefox               177
Internet Explorer     130
Name: Browser, dtype: int64
df['Browser'].value_counts().loc[lambda x : x < 10]
Opera Mini                        3
awin.com - site screen shotter    2
Playstation 4                     2
UC Browser                        1
Mozilla Compatible Agent          1
Iron                              1
Name: Browser, dtype: int64

Using value counts with a groupby

The value_counts() function doesn’t need to be confined to a single column. You can also use it in conjunction with a groupby. In the example below, we’ll group the traffic by Medium, and then return the value_counts() for each Source.

df.groupby('Medium')['Source'].value_counts()
Medium    Source                         
(none)    (direct)                           2456
email     Fly and Lure                          1
          automation                            1
organic   google                             6225
          bing                                924
          duckduckgo                          331
          ecosia.org                           29
referral  facebook.com                         17
          fr.search.yahoo.com                   4
          advancedsearch2.virginmedia.com       3
          adguard.com                           1
          atlanticbb.net                        1
          au.search.yahoo.com                   1
          baidu.com                             1
          ca.search.yahoo.com                   1
          cn.bing.com                           1
          de.search.yahoo.com                   1
          fish-fishingflies.co.uk               1
          from.flipboard.com                    1
Name: Source, dtype: int64

Matt Clarke, Monday, March 08, 2021

Matt Clarke Matt is a Digital Director who uses data science to help in his work. He has a Master's degree in Internet Retailing (plus two other Master's degrees in different fields) and specialises in the technical side of ecommerce and marketing.

Joining Data with pandas

Learn to combine data from multiple tables by joining data together using pandas.

Start course for FREE

Comments