Google Analytics provides a useful Shopping Behaviour Analysis report that lets you examine the volumes of users who are performing important actions on your ecommerce website, such as viewing products, adding them to the basket, reaching the checkout, and successfully making a transaction.
Since you can’t (usually) sell products on an ecommerce website unless customers view a product page, add an item to their cart, or visit the checkout page, these are essential micro conversions in the step towards the final goal of a customer completing a purchase or converting. Similarly, the more people who abandon their cart or checkout the more likely it is that you’ll have a hard time increasing conversion rates.
In order for an ecommerce site to increase its conversion rate it is essential for it to focus on measuring and optimising the micro conversions shown within the Shopping Behaviour report.
The Shopping Behaviour reports allow you to observe critical ecommerce metrics on completion rates and abandonment rates, so you can spot potential issues and look for areas to optimise. Unfortunately, these useful metrics are not exposed via the Google Analytics API as they are calculated internally.
In this project I’ll show you how you can use my GAPandas Google Analytics Reporting API package for Python to extract the required raw data and calculate the same metrics yourself in Pandas so you can monitor them and work towards increasing your conversion rate.
The Shopping Behaviour and Checkout Behaviour reports are created by Google Analytics using data stored against the ga:shoppingStage
dimension. The data in this dimension are populated by Enhanced Ecommerce tracking in the Universal Analytics version of Google Analytics that came before the new GA4.
If do not have Enhanced Ecommerce tracking as part of your GA implementation, you won’t be able to extract any data and the Shopping Behaviour report will not be populated.
Each time a page is viewed, your Google Analytics tracking tells the Google Analytics server what shopping stage you’re at by passing in the relevant value based on events and other parameters you configure during setup.
Therefore, while the Google Analytics API does not give you direct access to the cart abandonment rate, checkout abandonment rate, or cart completion rate, or checkout completion rate, you can manually calculate them using the raw data, once you understand how the calculations are performed within Google Analytics.
Here are the values the Google Analytics API adds to the ga:shoppingStage
dimension: ADD_TO_CART
, ADD_TO_CART_WITH_VIEW
, ALL_VISITS
, CART_ABANDONMENT
, CHECKOUT
, CHECKOUT_1
, CHECKOUT_1_ABANDONMENT
, CHECKOUT_2
, CHECKOUT_ABANDONMENT
, CHECKOUT_WITHOUT_CART_ADDITION
, CHECKOUT_WITH_CART_ADDITION
, NO_CART_ADDITION
, NO_PRODUCT_VIEW
, NO_SHOPPING_ACTIVITY
, PRODUCT_VIEW
, TRANSACTION
, TRANSACTION_WITHOUT_CHECKOUT
, CHECKOUT_2_ABANDONMENT
, and ADD_TO_CART_WITHOUT_VIEW
.
By querying these data individually you will have access to the raw metrics for each of these stages, and by performing some simple calculations you can additionally calculate cart and checkout completion and abandonment rates, among other useful ecommerce metrics.
In the Google Analytics interface you’ll see a tab on the Shopping Behaviour report called “shopping progression” when you click “Sessions”. This includes raw data on the number of sessions within the selected period, and the number of sessions with product views, adds to cart, checkouts, and transactions. Each of these individual metrics is raw data that can be queried by filtering against the ga:shoppingStage
dimension using Google Analytics API operators.
Metric | Description |
---|---|
All sessions | ga:shoppingStage==ALL_VISITS shows sessions from all users and is used in the internal calculations for completion and abandonment rates. |
Sessions with Product Views | ga:shoppingStage==PRODUCT_VIEW shows whether a session included a product page view. |
Sessions with Add to Cart | ga:shoppingStage==ADD_TO_CART shows whether a session included an add-to-cart event. |
Sessions with Checkout | ga:shoppingStage==CHECKOUT shows whether a session included a pageview of the checkout. |
Sessions with Transactions | ga:shoppingStage==TRANSACTION shows whether a session included a successful transaction. |
When you click the “Abandonments” tab on the Shopping Behaviour report in Google Analytics you’ll be able to observe the abandonment rates and metrics for each of the shopping stages.
These are essentially the opposite of the metrics above, so you get back the number of sessions with no shopping activity, no cart addition, cart abandonment, and checkout abandonment as raw numbers. As with the previous example, these can all be extracted by filtering against the ga:shoppingStage
dimension using the double equals operator ==
.
Metric | Description |
---|---|
No Shopping Activity | ga:shoppingStage==NO_SHOPPING_ACTIVITY shows whether a session had no shopping activity. |
Sessions with no Product Views | ga:shoppingStage==NO_PRODUCT_VIEW shows whether a session had no product views. |
No Cart Addition | ga:shoppingStage==NO_CART_ADDITION shows whether a session had no add to carts. |
Cart Abandonment | ga:shoppingStage==CART_ABANDONMENT shows whether a session had a cart abandonment. |
Checkout Abandonment | ga:shoppingStage==CHECKOUT_ABANDONMENT shows whether a session had a checkout abandonment. |
Sessions without Transactions | ga:shoppingStage== shows whether a checkout had no transaction. |
Since you can’t export the completion rates from the Google Analytics API we need to calculate them manually by extracting two values from the ga:shoppingStage
dimension. The below table shows how to calculate each of the completion rates shown.
Metric | Description |
---|---|
% Sessions with product views | ga:shoppingStage==PRODUCT_VIEWS / ga:shoppingStage==ALL_VISITS shows the percentage of sessions where the user viewed a product page. |
% Sessions with add to cart | ga:shoppingStage==ADD_TO_CART / ga:shoppingStage==ALL_VISITS shows the percentage of sessions where the user added to cart. |
% Sessions with checkout | ga:shoppingStage==CHECKOUT / ga:shoppingStage==ALL_VISITS shows the percentage of sessions where the user reached the checkout. |
% Sessions with transactions | ga:shoppingStage==TRANSACTION / ga:shoppingStage==ALL_VISITS shows the percentage of sessions where the user successfully ordered. |
Percentage abandonment rates can’t be extracted from the Google Analytics API individually, but can be calculated by querying two values from the ga:shoppingStage
dimension and then dividing them to obtain the rate. Here’s how to calculate them yourself.
Metric | Description |
---|---|
% No shopping activity | ga:shoppingStage==NO_SHOPPING_ACTIVITY / ga:shoppingStage==ALL_VISITS shows the percentage of sessions where there was no shopping activity. |
% No cart addition | ga:shoppingStage==NO_CART_ADDITION / ga:shoppingStage==PRODUCT_VIEW shows the percentage of sessions where customers viewed the product page but did not add to cart. |
% Cart abandonment | ga:shoppingStage==CART_ABANDONMENT / ga:shoppingStage==ADD_TO_CART shows the percentage of sessions where customers added to cart and then abandoned the cart. |
% Cart abandonment | ga:shoppingStage==CHECKOUT_ABANDONMENT / ga:shoppingStage==CHECKOUT shows the percentage of sessions where customers reached the checkout but abandoned. |
Now we’ve covered how the Shopping Behaviour abandonment and completion metrics work, we’ll write some Python code to extract the data from your own Google Analytics account and calculate the metrics GA doesn’t let you have via the regular API.
To get started, we’ll need to open a Jupyter notebook and load the packages required. We’ll be using Pandas for data analysis, and GAPandas for accessing the Google Analytics Reporting API. You can install GAPandas via PyPi by entering !pip3 install --upgrade gapandas
.
!pip3 install --upgrade gapandas
import pandas as pd
import gapandas as gp
We’ll need to set up GAPandas by specifying the Google Analytics view ID and the date range we want to query. We’ll also need a JSON client secrets keyfile to authenticate with the Google Analytics Reporting API. You can get this by creating a new project in the Google Developers Console and creating a JSON keyfile.
service = gp.get_service('client_secrets.json')
To save repeating ourselves in the queries that follow, and to allow you to re-use your Jupyter notebook at a later date, we can store the Google Analytics view ID and the start_date
and end_date
for the period we wish to examine in some variables.
view = '123456789'
start_date = '2021-01-01'
end_date = '2022-01-25'
Next, we’ll write a function to extract the individual metrics from the ga:shoppingStage
dimension. Creeating a function means you can reuse the same code for multiple queries, simply by changing the argument you pass to the query.
In the function we’re constructing an API payload dictionary containing the start_date
and end_date
variables, the metrics
we want to extract, and the dimensions
we want to group by. I’ve included ga:userType
in mine, as well as the date
since it’s common to see differences in completion and abandonment rates according to whether a session is from a new or returning user.
We’ll use filters
to pass in the ga:shoppingStage
value, which needs to match one of the ones used in Google Analytics and in the tables we covered above.
def sessions_by_shopping_stage(view, start_date, end_date, shopping_stage, user_type=None):
"""Get sessions by ga:shoppingStage for a given date period by date.
Args:
view (int): Google Analytics view ID
start_date (date): Start date for period in Google Analytics date format
end_date (date): End date for period in Google Analytics date format
shopping_stage (string): Google Analytics ga:shoppingStage value from below list
ADD_TO_CART, ADD_TO_CART_WITH_VIEW, ALL_VISITS,
CART_ABANDONMENT, CHECKOUT, CHECKOUT_1,
CHECKOUT_1_ABANDONMENT, CHECKOUT_2, CHECKOUT_ABANDONMENT,
CHECKOUT_WITHOUT_CART_ADDITION, CHECKOUT_WITH_CART_ADDITION,
NO_CART_ADDITION, NO_PRODUCT_VIEW, NO_SHOPPING_ACTIVITY,
PRODUCT_VIEW, TRANSACTION, TRANSACTION_WITHOUT_CHECKOUT,
CHECKOUT_2_ABANDONMENT, ADD_TO_CART_WITHOUT_VIEW
user_type (string): Optional user type (default = None). Values = New Visitor or Returning Visitor
Result:
Pandas dataframe containing the date and number of sessions for a given ga:shoppingStage
"""
payload = {
'start_date': start_date,
'end_date': end_date,
'metrics': 'ga:sessions',
'dimensions': 'ga:date, ga:userType',
'filters': 'ga:shoppingStage==' + shopping_stage
}
if user_type:
payload['segment'] = 'sessions::condition::ga:userType==' + user_type
df = gp.run_query(service, view, payload)
df['sessions'] = df['sessions'].astype(int)
return df
Now we’ve got our function, we can use it to query the Google Analytics data for the number of sessions with product views, simply by passing in the default arguments and the PRODUCT_VIEW
value for the ga:shoppingStage
.
df_sessions_with_product_views = sessions_by_shopping_stage(view, start_date, end_date, 'PRODUCT_VIEW')
df_sessions_with_product_views.head()
date | userType | sessions | |
---|---|---|---|
0 | 2022-01-23 | New Visitor | 163 |
1 | 2022-01-23 | Returning Visitor | 61 |
By passing in the optional New Visitor
value to the user_type
argument, we can see the number of sessions with product views specifically for new visitors.
df_sessions_with_product_views_new = sessions_by_shopping_stage(view, start_date, end_date, 'PRODUCT_VIEW', 'New Visitor')
df_sessions_with_product_views_new.head()
date | userType | sessions | |
---|---|---|---|
0 | 2022-01-23 | New Visitor | 163 |
Similarly, we can repeat the query and create a segment for returning visitors by passing in the value Returning Visitor
. It’s normal for these values to be higher, since returning visitors typically have a higher propensity to purchase, often because they know and trust you and have purchased before.
df_sessions_with_product_views_returning = sessions_by_shopping_stage(view, start_date, end_date, 'PRODUCT_VIEW', 'Returning Visitor')
df_sessions_with_product_views_returning.head()
date | userType | sessions | |
---|---|---|---|
0 | 2022-01-23 | Returning Visitor | 61 |
df_sessions_with_add_to_cart = sessions_by_shopping_stage(view, start_date, end_date, 'ADD_TO_CART')
df_sessions_with_add_to_cart.head()
date | userType | sessions | |
---|---|---|---|
0 | 2022-01-23 | New Visitor | 45 |
1 | 2022-01-23 | Returning Visitor | 23 |
df_sessions_with_checkout = sessions_by_shopping_stage(view, start_date, end_date, 'CHECKOUT')
df_sessions_with_checkout.head()
date | userType | sessions | |
---|---|---|---|
0 | 2022-01-23 | New Visitor | 29 |
1 | 2022-01-23 | Returning Visitor | 10 |
df_sessions_with_transaction = sessions_by_shopping_stage(view, start_date, end_date, 'TRANSACTION')
df_sessions_with_transaction.head()
date | userType | sessions | |
---|---|---|---|
0 | 2022-01-23 | New Visitor | 22 |
1 | 2022-01-23 | Returning Visitor | 14 |
df_sessions_with_no_shopping_activity = sessions_by_shopping_stage(view, start_date, end_date, 'NO_SHOPPING_ACTIVITY')
df_sessions_with_no_shopping_activity.head()
date | userType | sessions | |
---|---|---|---|
0 | 2022-01-23 | New Visitor | 61 |
1 | 2022-01-23 | Returning Visitor | 31 |
df_sessions_with_no_cart_addition = sessions_by_shopping_stage(view, start_date, end_date, 'NO_CART_ADDITION')
df_sessions_with_no_cart_addition.head()
date | userType | sessions | |
---|---|---|---|
0 | 2022-01-23 | New Visitor | 117 |
1 | 2022-01-23 | Returning Visitor | 37 |
df_sessions_with_cart_abandonment = sessions_by_shopping_stage(view, start_date, end_date, 'CART_ABANDONMENT')
df_sessions_with_cart_abandonment.head()
date | userType | sessions | |
---|---|---|---|
0 | 2022-01-23 | New Visitor | 15 |
1 | 2022-01-23 | Returning Visitor | 13 |
df_sessions_with_checkout_abandonment = sessions_by_shopping_stage(view, start_date, end_date, 'CHECKOUT_ABANDONMENT')
df_sessions_with_checkout_abandonment.head()
date | userType | sessions | |
---|---|---|---|
0 | 2022-01-23 | New Visitor | 9 |
1 | 2022-01-23 | Returning Visitor | 1 |
The other (more efficient) approach to querying shopping stage data in the Google Analytics API is to fetch all the ga:shoppingStage
values and then use the Pandas melt()
or pivot()
functions to re-shape the data to create a wide format dataframe with each value in its own column.
This approach is shown below and is essential if you wish to calculate cart or checkout abandonment rates or completion rates, since these require would otherwise require two queries to the Google Analytics API using the earlier method for extracting single values.
start_date = '2021-12-01'
end_date = '2022-01-23'
payload = {
'start_date': start_date,
'end_date': end_date,
'metrics': 'ga:sessions',
'dimensions': 'ga:date,ga:shoppingStage'
}
df = gp.run_query(service, view, payload)
df['sessions'] = df['sessions'].astype(int)
df.head()
date | shoppingStage | sessions | |
---|---|---|---|
0 | 2021-12-01 | ADD_TO_CART | 32 |
1 | 2021-12-01 | ADD_TO_CART_WITHOUT_VIEW | 1 |
2 | 2021-12-01 | ADD_TO_CART_WITH_VIEW | 31 |
3 | 2021-12-01 | ALL_VISITS | 287 |
4 | 2021-12-01 | CART_ABANDONMENT | 4 |
The easiest way to use Pandas to re-shape the data is via the pivot()
function. Using pivot()
we can convert the long form dataframe into a wide format dataframe. To do that we’ll set the date
column as the index, set the shoppingStage
as the column names, and put the sessions
metric in for the values. We need to use fillna(0)
to turn NaN
values into zeroes and then reset the index using reset_index()
.
df = df.pivot(index='date', columns='shoppingStage', values='sessions').fillna(0).reset_index()
df.head()
shoppingStage | date | ADD_TO_CART | ADD_TO_CART_WITHOUT_VIEW | ADD_TO_CART_WITH_VIEW | ALL_VISITS | CART_ABANDONMENT | CHECKOUT | CHECKOUT_1 | CHECKOUT_1_ABANDONMENT | CHECKOUT_2 | CHECKOUT_2_ABANDONMENT | CHECKOUT_2_WITHOUT_CHECKOUT_1 | CHECKOUT_ABANDONMENT | CHECKOUT_WITHOUT_CART_ADDITION | CHECKOUT_WITH_CART_ADDITION | NO_CART_ADDITION | NO_PRODUCT_VIEW | NO_SHOPPING_ACTIVITY | PRODUCT_VIEW | TRANSACTION | TRANSACTION_WITHOUT_CHECKOUT |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2021-12-01 | 32.0 | 1.0 | 31.0 | 287.0 | 4.0 | 25.0 | 25.0 | 4.0 | 20.0 | 3.0 | 0.0 | 7.0 | 4.0 | 21.0 | 137.0 | 5.0 | 113.0 | 169.0 | 26.0 | 8.0 |
1 | 2021-12-02 | 36.0 | 0.0 | 36.0 | 325.0 | 10.0 | 24.0 | 24.0 | 5.0 | 19.0 | 2.0 | 0.0 | 7.0 | 5.0 | 19.0 | 146.0 | 3.0 | 135.0 | 187.0 | 27.0 | 10.0 |
2 | 2021-12-03 | 42.0 | 0.0 | 42.0 | 238.0 | 11.0 | 25.0 | 25.0 | 3.0 | 21.0 | 2.0 | 0.0 | 5.0 | 2.0 | 23.0 | 106.0 | 1.0 | 88.0 | 149.0 | 28.0 | 8.0 |
3 | 2021-12-04 | 24.0 | 0.0 | 24.0 | 215.0 | 9.0 | 13.0 | 13.0 | 2.0 | 10.0 | 1.0 | 0.0 | 3.0 | 3.0 | 10.0 | 96.0 | 2.0 | 92.0 | 121.0 | 15.0 | 5.0 |
4 | 2021-12-05 | 67.0 | 0.0 | 67.0 | 342.0 | 21.0 | 42.0 | 42.0 | 7.0 | 31.0 | 3.0 | 0.0 | 10.0 | 4.0 | 38.0 | 146.0 | 4.0 | 124.0 | 214.0 | 41.0 | 9.0 |
To specifically extract only the completion metrics we can filter the Pandas dataframe so it shows only specific columns with values that match the shopping behaviour reports. You can do this by passing a list of column names from the dataframe that correspond to the values that were stored in the ga:shoppingStage
dimension when the original data were exported.
df_shopping_behaviour = df[['date', 'ALL_VISITS', 'PRODUCT_VIEW',
'ADD_TO_CART', 'CHECKOUT', 'TRANSACTION']]
df_shopping_behaviour.head()
shoppingStage | date | ALL_VISITS | PRODUCT_VIEW | ADD_TO_CART | CHECKOUT | TRANSACTION |
---|---|---|---|---|---|---|
0 | 2021-12-01 | 287.0 | 169.0 | 32.0 | 25.0 | 26.0 |
1 | 2021-12-02 | 325.0 | 187.0 | 36.0 | 24.0 | 27.0 |
2 | 2021-12-03 | 238.0 | 149.0 | 42.0 | 25.0 | 28.0 |
3 | 2021-12-04 | 215.0 | 121.0 | 24.0 | 13.0 | 15.0 |
4 | 2021-12-05 | 342.0 | 214.0 | 67.0 | 42.0 | 41.0 |
To calculate completion rates, such as sessions with product views, sessions with add to cart, sessions with checkout, and sessions with transactions, we need to use two values from the dimension and then divide them accordingly to calculate the rate, then multiply that by 100 to express it as a percentage.
df_completion_rates = df[['date', 'ALL_VISITS', 'PRODUCT_VIEW',
'ADD_TO_CART', 'CHECKOUT', 'TRANSACTION']]
df_completion_rates = df_completion_rates.assign(\
sessions_with_product_views = ((df_completion_rates['PRODUCT_VIEW'] / df_completion_rates['ALL_VISITS'])*100))
df_completion_rates = df_completion_rates.assign(\
sessions_with_add_to_cart = ((df_completion_rates['ADD_TO_CART'] / df_completion_rates['ALL_VISITS'])*100))
df_completion_rates = df_completion_rates.assign(\
sessions_with_checkout = ((df_completion_rates['CHECKOUT'] / df_completion_rates['ALL_VISITS'])*100))
df_completion_rates = df_completion_rates.assign(\
sessions_with_transactions = ((df_completion_rates['TRANSACTION'] / df_completion_rates['ALL_VISITS'])*100))
df_completion_rates = df_completion_rates[['date', 'sessions_with_product_views', 'sessions_with_add_to_cart',
'sessions_with_checkout', 'sessions_with_transactions']]
df_completion_rates.head()
shoppingStage | date | sessions_with_product_views | sessions_with_add_to_cart | sessions_with_checkout | sessions_with_transactions |
---|---|---|---|---|---|
0 | 2021-12-01 | 58.885017 | 11.149826 | 8.710801 | 9.059233 |
1 | 2021-12-02 | 57.538462 | 11.076923 | 7.384615 | 8.307692 |
2 | 2021-12-03 | 62.605042 | 17.647059 | 10.504202 | 11.764706 |
3 | 2021-12-04 | 56.279070 | 11.162791 | 6.046512 | 6.976744 |
4 | 2021-12-05 | 62.573099 | 19.590643 | 12.280702 | 11.988304 |
You can use a similar approach to filter the Pandas dataframe to show only the abandonment metrics, simply by passing in a list of the metrics you want to keep.
df_abandonment_metrics = df[['date', 'ALL_VISITS', 'NO_SHOPPING_ACTIVITY',
'NO_CART_ADDITION', 'CART_ABANDONMENT', 'CHECKOUT_ABANDONMENT']]
df_abandonment_metrics.head()
shoppingStage | date | ALL_VISITS | NO_SHOPPING_ACTIVITY | NO_CART_ADDITION | CART_ABANDONMENT | CHECKOUT_ABANDONMENT |
---|---|---|---|---|---|---|
0 | 2021-12-01 | 287.0 | 113.0 | 137.0 | 4.0 | 7.0 |
1 | 2021-12-02 | 325.0 | 135.0 | 146.0 | 10.0 | 7.0 |
2 | 2021-12-03 | 238.0 | 88.0 | 106.0 | 11.0 | 5.0 |
3 | 2021-12-04 | 215.0 | 92.0 | 96.0 | 9.0 | 3.0 |
4 | 2021-12-05 | 342.0 | 124.0 | 146.0 | 21.0 | 10.0 |
Finally, we can calculate the abandonment rates. Unlike the completion rates, which all use ALL_VISITS
, the abandonment rates need to use specific metrics stored in ga:shoppingStage
to ensure they show exactly what we need.
df_abandonment_rates = df.copy()
df_abandonment_rates = df_abandonment_rates.assign(\
sessions_with_no_shipping_activity = ((df_abandonment_rates['NO_SHOPPING_ACTIVITY'] / df_abandonment_rates['ALL_VISITS'])*100))
df_abandonment_rates = df_abandonment_rates.assign(\
sessions_with_no_cart_addition = ((df_abandonment_rates['NO_CART_ADDITION'] / df_abandonment_rates['PRODUCT_VIEW'])*100))
df_abandonment_rates = df_abandonment_rates.assign(\
sessions_with_cart_abandonment = ((df_abandonment_rates['CART_ABANDONMENT'] / df_abandonment_rates['ADD_TO_CART'])*100))
df_abandonment_rates = df_abandonment_rates.assign(\
sessions_with_checkout_abandonment = ((df_abandonment_rates['CHECKOUT_ABANDONMENT'] / df_abandonment_rates['CHECKOUT'])*100))
df_abandonment_rates = df_abandonment_rates[['date', 'sessions_with_no_shipping_activity', 'sessions_with_no_cart_addition',
'sessions_with_cart_abandonment', 'sessions_with_checkout_abandonment']]
And that’s it. You can now extract completion and abandonment data from the Google Analytics API, even though Google doesn’t let you do this by default. It’s fairly easy to change the payload
to turn these data into a report that you can use to monitor the metrics over time to help your ecommerce team focus on improvement and keep an eye out for potential site issues.
df_abandonment_rates.head()
shoppingStage | date | sessions_with_no_shipping_activity | sessions_with_no_cart_addition | sessions_with_cart_abandonment | sessions_with_checkout_abandonment |
---|---|---|---|---|---|
0 | 2021-12-01 | 39.372822 | 81.065089 | 12.500000 | 28.000000 |
1 | 2021-12-02 | 41.538462 | 78.074866 | 27.777778 | 29.166667 |
2 | 2021-12-03 | 36.974790 | 71.140940 | 26.190476 | 20.000000 |
3 | 2021-12-04 | 42.790698 | 79.338843 | 37.500000 | 23.076923 |
4 | 2021-12-05 | 36.257310 | 68.224299 | 31.343284 | 23.809524 |
Matt Clarke, Tuesday, January 25, 2022