The Pandas value_counts()
function can be used to count the number of times a value occurs within a dataframe column or series, as well as calculating frequency distributions. Here’s a quick guide to using it in your Exploratory Data Analysis.
Open a Jupyter notebook, import pandas
as pd
and then use the read_csv()
function to load up some data. I’m using a Google Analytics dataset containing some traffic from one of my other sites.
import pandas as pd
df = pd.read_csv('https://raw.githubusercontent.com/flyandlure/datasets/master/google-analytics.csv')
df.head()
User Type | Source | Medium | Browser | Device Category | Date | Pageviews | |
---|---|---|---|---|---|---|---|
0 | New Visitor | (direct) | (none) | Amazon Silk | mobile | 2020-07-31 | 3 |
1 | New Visitor | (direct) | (none) | Amazon Silk | mobile | 2020-07-14 | 1 |
2 | New Visitor | (direct) | (none) | Amazon Silk | tablet | 2020-07-14 | 1 |
3 | New Visitor | (direct) | (none) | Amazon Silk | tablet | 2020-08-07 | 1 |
4 | New Visitor | (direct) | (none) | Amazon Silk | tablet | 2020-08-12 | 1 |
The simplest way to use value_counts()
is to append the function to a Pandas dataframe column. This will count the number of times each value occurs within the whole column and return the data in a series. By default, the data are returned in descending order.
df['Browser'].value_counts()
Chrome 6869
Safari 1379
Edge 817
Samsung Internet 321
Amazon Silk 216
Firefox 177
Internet Explorer 130
Android Webview 45
Android Browser 16
Opera 10
Safari (in-app) 10
Opera Mini 3
awin.com - site screen shotter 2
Playstation 4 2
UC Browser 1
Mozilla Compatible Agent 1
Iron 1
Name: Browser, dtype: int64
If you want to display the data in ascending order of the number of unique values encountered, you can add the ascending=True
parameter to the brackets. To display the output in a Pandas dataframe, simply append .to_frame()
to the end of your code.
df['Browser'].value_counts(ascending=True).to_frame()
Browser | |
---|---|
Iron | 1 |
Mozilla Compatible Agent | 1 |
UC Browser | 1 |
Playstation 4 | 2 |
awin.com - site screen shotter | 2 |
Opera Mini | 3 |
Safari (in-app) | 10 |
Opera | 10 |
Android Browser | 16 |
Android Webview | 45 |
Internet Explorer | 130 |
Firefox | 177 |
Amazon Silk | 216 |
Samsung Internet | 321 |
Edge | 817 |
Safari | 1379 |
Chrome | 6869 |
Another really neat trick you can do is calculate a frequency distribution. By adding the normalize=True
argument, you get back the percentage of the total values that are made up by each row.
df['Device Category'].value_counts(normalize=True)
desktop 0.4882
mobile 0.4565
tablet 0.0553
Name: Device Category, dtype: float64
Another useful argument is bins
. This will count the unique values in the column and then bin them into the number of bins defined in the argument.
df['Pageviews'].value_counts(bins=5)
(0.986, 3.6] 9579
(3.6, 6.2] 371
(6.2, 8.8] 32
(8.8, 11.4] 15
(11.4, 14.0] 3
Name: Pageviews, dtype: int64
If you want to count the unique values that make a certain condition, you can append a simple lambda
function. In the example below we’ll count the number of times each Browser
appears when there are more than 100 occurrences in the column, and then do the same for browsers with fewer than 10 occurrences.
df['Browser'].value_counts().loc[lambda x : x > 100]
Chrome 6869
Safari 1379
Edge 817
Samsung Internet 321
Amazon Silk 216
Firefox 177
Internet Explorer 130
Name: Browser, dtype: int64
df['Browser'].value_counts().loc[lambda x : x < 10]
Opera Mini 3
awin.com - site screen shotter 2
Playstation 4 2
UC Browser 1
Mozilla Compatible Agent 1
Iron 1
Name: Browser, dtype: int64
The value_counts()
function doesn’t need to be confined to a single column. You can also use it in conjunction with a groupby
. In the example below, we’ll group the traffic by Medium
, and then return the value_counts()
for each Source
.
df.groupby('Medium')['Source'].value_counts()
Medium Source
(none) (direct) 2456
email Fly and Lure 1
automation 1
organic google 6225
bing 924
duckduckgo 331
ecosia.org 29
referral facebook.com 17
fr.search.yahoo.com 4
advancedsearch2.virginmedia.com 3
adguard.com 1
atlanticbb.net 1
au.search.yahoo.com 1
baidu.com 1
ca.search.yahoo.com 1
cn.bing.com 1
de.search.yahoo.com 1
fish-fishingflies.co.uk 1
from.flipboard.com 1
Name: Source, dtype: int64
Matt Clarke, Monday, March 08, 2021