How to visualise analytics data using heatmaps in Seaborn

Picture by Fabian Irsara, Unsplash.

14 minutes to read

Heatmaps are one of the most intuitive ways to display data across two dimensions, and they work particularly well on temporal data, such as web analytics metrics. They’re a great way to visualise web metrics and allow you to spot trends and patterns without the need to go over any numbers. Here’s how you can create them using Seaborn and GAPandas.

Load the packages

For this project we’ll be using GAPandas to fetch data from the Google Analytics reporting API, Pandas for displaying the tabular data in dataframes, and Seaborn and Matplotlib for data visualisation using heatmaps.

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from gapandas import connect, query

Fetch the data

First, obtain your client_secrets.json key for a Service Account and ensure the email address associated with it is present in your Google Analytics account. Then format an API query payload to fetch your data. To display data on a heatmap you need two categorical columns, such as the year and month, or hour and day, and one metric. We’ll start off by fetching data for sessions for each month over the past few years.

service = connect.get_service('personal_client_secrets.json')

payload = {
    'start_date': '2017-01-01',
    'end_date': '2020-10-31',
    'metrics': 'ga:sessions',
    'dimensions': 'ga:year, ga:month'
}

df = query.run_query(service, '101646265', payload)

We’ll need to do a bit of reformatting on the data returned by GAPandas to ensure the year, month and sessions are set to the correct data type. You can check the data types by entering df.info().

df['year'] = df['year'].astype(int)
df['month'] = df['month'].astype(int)
df['sessions'] = df['sessions'].astype(int)
df.head()

	year	month	sessions
0	2017	1	3907
1	2017	2	3822
2	2017	3	4989
3	2017	4	6181
4	2017	5	6312

Re-shape from long to wide

Next we need to re-shape our data from the current long format to wide format. There are several ways to do this, but the easiest is to use the pivot() function. We’ll run pivot() on the dataframe, set the year as the index column, set the columns to be the month number and the values within to be the sessions metrics. Finally, we will fill any missing values with 0 and set the data type to int.

df_monthly = df.pivot(index='year', columns='month', values='sessions').fillna(0).astype(int)
df_monthly

month	1	2	3	4	5	6	7	8	9	10	11	12
year
2017	3907	3822	4989	6181	6312	5249	4693	5346	4885	5802	5383	5044
2018	7316	6735	7905	9050	9669	9695	9866	13227	11994	11975	11156	11873
2019	15209	15645	18539	17563	18534	18460	18400	17060	14235	14257	12662	11233
2020	14650	15516	17029	15555	28553	32476	30106	26461	23159	20839	0	0

Create a monthly sessions heatmap

Now our data is in the right format we can use Seaborn and Matplotlib to visualise it on a heatmap. As our heatmap includes the metrics on each tile, it’s worth defining the size of the heatmap image to ensure it is legible. We can do this with f, ax = plt.subplots(figsize=(20, 5)), which sets the heatmap to 20 inches by 5 inches. There are many colour palettes available for Seaborn, but I’d highly recommend you choose one which starts pale and gets darker, otherwise your heatmap will be harder to decipher.

Next we’ll use the Seaborn heatmap() function to create the heatmap itself. We pass in the df_monthly dataframe in wide format, set annotations to appear using annot=True, set the formatting to d using fmt="d", pass in the ax settings and the colour map with cmap, and set the tiles to be displayed as squares. Finally, we use a bit of Matplotlib code to add in a custom title and named axes.

f, ax = plt.subplots(figsize=(20, 5))
cmap = sns.color_palette("Blues")

monthly_sessions = sns.heatmap(df_monthly, 
                    annot=True, 
                    fmt="d", 
                    linewidths=5, 
                    ax=ax, 
                    cmap=cmap, 
                    square=True)

ax.axes.set_title("Sessions by month and year",fontsize=20)
ax.set_xlabel("Month",fontsize=15)
ax.set_ylabel("Year",fontsize=15)
plt.show()

png

As you can see from the above heatmap, we get back a visualisation with a row per year and one tile representing each calendar month. The sessions are printed onto each tile and they change in colour from pale blue to dark blue depending on the number of sessions. This makes it really easy to see when the site traffic was busiest.

Fetch more data

To expand upon the above example we’ll now create another heatmap showing how sessions vary for each hour and day. The query we pass to the Google Analytics API via GAPandas is very similar, but we specify the dayOfWeekName and the hour.

payload = {
    'start_date': '2017-01-01',
    'end_date': '2020-10-31',
    'metrics': 'ga:sessions',
    'dimensions': 'ga:dayOfWeekName, ga:hour'
}

df = query.run_query(service, '101646265', payload)

As with the previous example, we need to do a little modification to the data returned from GAPandas to ensure it’s in the right format for plotting onto the heatmap. All that is required is setting the hour and session column to int using the astype() function.

df['hour'] = df['hour'].astype(int)
df['sessions'] = df['sessions'].astype(int)

Re-shape the data

Again, as we need the data to be in wide format instead of long format, we need to re-shape it. We can do this using pivot() again. We’ll set the index to the dayOfWeekName, set the columns to be the hour value and the metric to be the sessions. Any missing values will be filled with a 0 using fillna() and the data set to int using astype(). As we need to manipulate the dayOfWeekName column, we’ll reset the index.

df_hourly = df.pivot(index='dayOfWeekName', columns='hour', values='sessions').fillna(0).astype(int)
df_hourly.reset_index()

hour	dayOfWeekName	0	1	2	3	4	5	6	7	8	...	14	15	16	17	18
0	Friday	1879	1500	1446	1326	1219	1529	2021	2594	2932	...	4280	4347	4449	4591	4868
1	Monday	2187	1881	1596	1533	1416	1581	2059	2629	3109	...	4759	5162	5064	5206	6087
2	Saturday	1934	1536	1376	1270	1249	1482	2063	2703	3167	...	4659	4771	4948	5057	5002
3	Sunday	2069	1774	1460	1309	1315	1452	2055	2802	3557	...	4931	5390	5680	5718	6231
4	Thursday	1862	1597	1510	1367	1388	1562	2028	2486	2958	...	4496	4520	4581	4567	4948
5	Tuesday	2120	1707	1671	1504	1506	1656	2198	2650	3078	...	4690	4715	4788	5043	5671
6	Wednesday	1873	1645	1587	1439	1394	1611	2071	2648	2910	...	4342	4657	4597	4692	5245

7 rows × 25 columns

Order the days of the week correctly

The issue with the dataframe above is that the dayOfWeekName values are not in the correct order - they go Friday, Monday, Saturday etc. This will make our heatmap very confusing, so we now need to re-order them. After some head scratching, I figured out a really easy way to do this. By creating a list of days and listing them in the correct order, you can do a groupby() on the data using the dayOfWeekName and then pass the days list in to the reindex() function. This re-orders the weekdays so they’re in the right order.

days = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
df_hourly = df_hourly.groupby(['dayOfWeekName']).sum().reindex(days)
df_hourly.head()

hour	0	1	2	3	4	5	6	7	8	9	...	14	15	16	17	18
dayOfWeekName
Monday	2187	1881	1596	1533	1416	1581	2059	2629	3109	3615	...	4759	5162	5064	5206	6087
Tuesday	2120	1707	1671	1504	1506	1656	2198	2650	3078	3477	...	4690	4715	4788	5043	5671
Wednesday	1873	1645	1587	1439	1394	1611	2071	2648	2910	3291	...	4342	4657	4597	4692	5245
Thursday	1862	1597	1510	1367	1388	1562	2028	2486	2958	3320	...	4496	4520	4581	4567	4948
Friday	1879	1500	1446	1326	1219	1529	2021	2594	2932	3295	...	4280	4347	4449	4591	4868

5 rows × 24 columns

Create the heatmap

The final step, which is just a simple modification of the previous heatmap, is to pass in the df_hourly dataframe which is now in wide format with correctly ordered weekdays and then use Seaborn to create the visualisation. We get back a heatmap showing the hourly sessions for each day of the week over the past few years, allowing us to clearly visualise when the site is busiest.

f, ax = plt.subplots(figsize=(20, 5))
cmap = sns.color_palette("Blues")

monthly_sessions = sns.heatmap(df_hourly, 
                    annot=True, 
                    fmt="d", 
                    linewidths=5, 
                    ax=ax, 
                    cmap=cmap, 
                    square=True)

ax.axes.set_title("Sessions by day and hour",fontsize=20)
ax.set_xlabel("Hour",fontsize=15)
ax.set_ylabel("Day",fontsize=15)
plt.show()

png

Matt Clarke, Saturday, March 06, 2021

Matt Clarke Matt is an Ecommerce and Marketing Director who uses data science to help in his work. Matt has a Master's degree in Internet Retailing (plus two other Master's degrees in different fields) and specialises in the technical side of ecommerce and marketing.