How to calculate abandonment and completion rates using the Google Analytics API

The Google Analytics API does not expose Shopping Behaviour report data such as cart and checkout abandonment and completion rates. Here's how to calculate them yourself using GAPandas.

How to calculate abandonment and completion rates using the Google Analytics API
Picture by SHVETS, Pexels.
29 minutes to read

Google Analytics provides a useful Shopping Behaviour Analysis report that lets you examine the volumes of users who are performing important actions on your ecommerce website, such as viewing products, adding them to the basket, reaching the checkout, and successfully making a transaction.

Since you can’t (usually) sell products on an ecommerce website unless customers view a product page, add an item to their cart, or visit the checkout page, these are essential micro conversions in the step towards the final goal of a customer completing a purchase or converting. Similarly, the more people who abandon their cart or checkout the more likely it is that you’ll have a hard time increasing conversion rates.

In order for an ecommerce site to increase its conversion rate it is essential for it to focus on measuring and optimising the micro conversions shown within the Shopping Behaviour report.

The Shopping Behaviour reports allow you to observe critical ecommerce metrics on completion rates and abandonment rates, so you can spot potential issues and look for areas to optimise. Unfortunately, these useful metrics are not exposed via the Google Analytics API as they are calculated internally.

In this project I’ll show you how you can use my GAPandas Google Analytics Reporting API package for Python to extract the required raw data and calculate the same metrics yourself in Pandas so you can monitor them and work towards increasing your conversion rate.

Google Analytics Shopping Behaviour report

How Google Analytics creates Shopping Behaviour reports

The Shopping Behaviour and Checkout Behaviour reports are created by Google Analytics using data stored against the ga:shoppingStage dimension. The data in this dimension are populated by Enhanced Ecommerce tracking in the Universal Analytics version of Google Analytics that came before the new GA4.

If do not have Enhanced Ecommerce tracking as part of your GA implementation, you won’t be able to extract any data and the Shopping Behaviour report will not be populated.

Each time a page is viewed, your Google Analytics tracking tells the Google Analytics server what shopping stage you’re at by passing in the relevant value based on events and other parameters you configure during setup.

Therefore, while the Google Analytics API does not give you direct access to the cart abandonment rate, checkout abandonment rate, or cart completion rate, or checkout completion rate, you can manually calculate them using the raw data, once you understand how the calculations are performed within Google Analytics.

Here are the values the Google Analytics API adds to the ga:shoppingStage dimension: ADD_TO_CART, ADD_TO_CART_WITH_VIEW, ALL_VISITS, CART_ABANDONMENT, CHECKOUT, CHECKOUT_1, CHECKOUT_1_ABANDONMENT, CHECKOUT_2, CHECKOUT_ABANDONMENT, CHECKOUT_WITHOUT_CART_ADDITION, CHECKOUT_WITH_CART_ADDITION, NO_CART_ADDITION, NO_PRODUCT_VIEW, NO_SHOPPING_ACTIVITY, PRODUCT_VIEW, TRANSACTION, TRANSACTION_WITHOUT_CHECKOUT, CHECKOUT_2_ABANDONMENT, and ADD_TO_CART_WITHOUT_VIEW.

By querying these data individually you will have access to the raw metrics for each of these stages, and by performing some simple calculations you can additionally calculate cart and checkout completion and abandonment rates, among other useful ecommerce metrics.

Shopping Behaviour shopping progression data

In the Google Analytics interface you’ll see a tab on the Shopping Behaviour report called “shopping progression” when you click “Sessions”. This includes raw data on the number of sessions within the selected period, and the number of sessions with product views, adds to cart, checkouts, and transactions. Each of these individual metrics is raw data that can be queried by filtering against the ga:shoppingStage dimension using Google Analytics API operators.

Metric Description
All sessions ga:shoppingStage==ALL_VISITS shows sessions from all users and is used in the internal calculations for completion and abandonment rates.
Sessions with Product Views ga:shoppingStage==PRODUCT_VIEW shows whether a session included a product page view.
Sessions with Add to Cart ga:shoppingStage==ADD_TO_CART shows whether a session included an add-to-cart event.
Sessions with Checkout ga:shoppingStage==CHECKOUT shows whether a session included a pageview of the checkout.
Sessions with Transactions ga:shoppingStage==TRANSACTION shows whether a session included a successful transaction.

Shopping Behaviour abandonment data

When you click the “Abandonments” tab on the Shopping Behaviour report in Google Analytics you’ll be able to observe the abandonment rates and metrics for each of the shopping stages.

These are essentially the opposite of the metrics above, so you get back the number of sessions with no shopping activity, no cart addition, cart abandonment, and checkout abandonment as raw numbers. As with the previous example, these can all be extracted by filtering against the ga:shoppingStage dimension using the double equals operator ==.

Metric Description
No Shopping Activity ga:shoppingStage==NO_SHOPPING_ACTIVITY shows whether a session had no shopping activity.
Sessions with no Product Views ga:shoppingStage==NO_PRODUCT_VIEW shows whether a session had no product views.
No Cart Addition ga:shoppingStage==NO_CART_ADDITION shows whether a session had no add to carts.
Cart Abandonment ga:shoppingStage==CART_ABANDONMENT shows whether a session had a cart abandonment.
Checkout Abandonment ga:shoppingStage==CHECKOUT_ABANDONMENT shows whether a session had a checkout abandonment.
Sessions without Transactions ga:shoppingStage== shows whether a checkout had no transaction.

Shopping behaviour completion rates

Since you can’t export the completion rates from the Google Analytics API we need to calculate them manually by extracting two values from the ga:shoppingStage dimension. The below table shows how to calculate each of the completion rates shown.

Metric Description
% Sessions with product views ga:shoppingStage==PRODUCT_VIEWS / ga:shoppingStage==ALL_VISITS shows the percentage of sessions where the user viewed a product page.
% Sessions with add to cart ga:shoppingStage==ADD_TO_CART / ga:shoppingStage==ALL_VISITS shows the percentage of sessions where the user added to cart.
% Sessions with checkout ga:shoppingStage==CHECKOUT / ga:shoppingStage==ALL_VISITS shows the percentage of sessions where the user reached the checkout.
% Sessions with transactions ga:shoppingStage==TRANSACTION / ga:shoppingStage==ALL_VISITS shows the percentage of sessions where the user successfully ordered.

Shopping behaviour abandonment rates

Percentage abandonment rates can’t be extracted from the Google Analytics API individually, but can be calculated by querying two values from the ga:shoppingStage dimension and then dividing them to obtain the rate. Here’s how to calculate them yourself.

Metric Description
% No shopping activity ga:shoppingStage==NO_SHOPPING_ACTIVITY / ga:shoppingStage==ALL_VISITS shows the percentage of sessions where there was no shopping activity.
% No cart addition ga:shoppingStage==NO_CART_ADDITION / ga:shoppingStage==PRODUCT_VIEW shows the percentage of sessions where customers viewed the product page but did not add to cart.
% Cart abandonment ga:shoppingStage==CART_ABANDONMENT / ga:shoppingStage==ADD_TO_CART shows the percentage of sessions where customers added to cart and then abandoned the cart.
% Cart abandonment ga:shoppingStage==CHECKOUT_ABANDONMENT / ga:shoppingStage==CHECKOUT shows the percentage of sessions where customers reached the checkout but abandoned.

Load the packages

Now we’ve covered how the Shopping Behaviour abandonment and completion metrics work, we’ll write some Python code to extract the data from your own Google Analytics account and calculate the metrics GA doesn’t let you have via the regular API.

To get started, we’ll need to open a Jupyter notebook and load the packages required. We’ll be using Pandas for data analysis, and GAPandas for accessing the Google Analytics Reporting API. You can install GAPandas via PyPi by entering !pip3 install --upgrade gapandas.

!pip3 install --upgrade gapandas
import pandas as pd
import gapandas as gp

Set up GAPandas

We’ll need to set up GAPandas by specifying the Google Analytics view ID and the date range we want to query. We’ll also need a JSON client secrets keyfile to authenticate with the Google Analytics Reporting API. You can get this by creating a new project in the Google Developers Console and creating a JSON keyfile.

service = gp.get_service('client_secrets.json')

To save repeating ourselves in the queries that follow, and to allow you to re-use your Jupyter notebook at a later date, we can store the Google Analytics view ID and the start_date and end_date for the period we wish to examine in some variables.

view = '123456789'
start_date = '2021-01-01'
end_date = '2022-01-25'

Querying shoppingStage data from Google Analytics

Next, we’ll write a function to extract the individual metrics from the ga:shoppingStage dimension. Creeating a function means you can reuse the same code for multiple queries, simply by changing the argument you pass to the query.

In the function we’re constructing an API payload dictionary containing the start_date and end_date variables, the metrics we want to extract, and the dimensions we want to group by. I’ve included ga:userType in mine, as well as the date since it’s common to see differences in completion and abandonment rates according to whether a session is from a new or returning user.

We’ll use filters to pass in the ga:shoppingStage value, which needs to match one of the ones used in Google Analytics and in the tables we covered above.

def sessions_by_shopping_stage(view, start_date, end_date, shopping_stage, user_type=None):
    """Get sessions by ga:shoppingStage for a given date period by date. 
    
    Args:
        view (int): Google Analytics view ID
        start_date (date): Start date for period in Google Analytics date format
        end_date (date): End date for period in Google Analytics date format
        shopping_stage (string): Google Analytics ga:shoppingStage value from below list
        
        ADD_TO_CART, ADD_TO_CART_WITH_VIEW, ALL_VISITS, 
        CART_ABANDONMENT, CHECKOUT, CHECKOUT_1, 
        CHECKOUT_1_ABANDONMENT, CHECKOUT_2, CHECKOUT_ABANDONMENT, 
        CHECKOUT_WITHOUT_CART_ADDITION, CHECKOUT_WITH_CART_ADDITION, 
        NO_CART_ADDITION, NO_PRODUCT_VIEW, NO_SHOPPING_ACTIVITY, 
        PRODUCT_VIEW, TRANSACTION, TRANSACTION_WITHOUT_CHECKOUT, 
        CHECKOUT_2_ABANDONMENT, ADD_TO_CART_WITHOUT_VIEW
    
        user_type (string): Optional user type (default = None). Values = New Visitor or Returning Visitor
    
    Result: 
        Pandas dataframe containing the date and number of sessions for a given ga:shoppingStage
    """
    
    payload = {
        'start_date': start_date,
        'end_date': end_date,
        'metrics': 'ga:sessions',
        'dimensions': 'ga:date, ga:userType',
        'filters': 'ga:shoppingStage==' + shopping_stage
    }
    
    if user_type:
        payload['segment'] = 'sessions::condition::ga:userType==' + user_type
    
    df = gp.run_query(service, view, payload)
    df['sessions'] = df['sessions'].astype(int)
    return df

Sessions with product views

Now we’ve got our function, we can use it to query the Google Analytics data for the number of sessions with product views, simply by passing in the default arguments and the PRODUCT_VIEW value for the ga:shoppingStage.

df_sessions_with_product_views = sessions_by_shopping_stage(view, start_date, end_date, 'PRODUCT_VIEW')
df_sessions_with_product_views.head()
date userType sessions
0 2022-01-23 New Visitor 163
1 2022-01-23 Returning Visitor 61

By passing in the optional New Visitor value to the user_type argument, we can see the number of sessions with product views specifically for new visitors.

df_sessions_with_product_views_new = sessions_by_shopping_stage(view, start_date, end_date, 'PRODUCT_VIEW', 'New Visitor')
df_sessions_with_product_views_new.head()
date userType sessions
0 2022-01-23 New Visitor 163

Similarly, we can repeat the query and create a segment for returning visitors by passing in the value Returning Visitor. It’s normal for these values to be higher, since returning visitors typically have a higher propensity to purchase, often because they know and trust you and have purchased before.

df_sessions_with_product_views_returning = sessions_by_shopping_stage(view, start_date, end_date, 'PRODUCT_VIEW', 'Returning Visitor')
df_sessions_with_product_views_returning.head()
date userType sessions
0 2022-01-23 Returning Visitor 61

Sessions with add to cart

df_sessions_with_add_to_cart = sessions_by_shopping_stage(view, start_date, end_date, 'ADD_TO_CART')
df_sessions_with_add_to_cart.head()
date userType sessions
0 2022-01-23 New Visitor 45
1 2022-01-23 Returning Visitor 23

Sessions with checkout

df_sessions_with_checkout = sessions_by_shopping_stage(view, start_date, end_date, 'CHECKOUT')
df_sessions_with_checkout.head()
date userType sessions
0 2022-01-23 New Visitor 29
1 2022-01-23 Returning Visitor 10

Sessions with transactions

df_sessions_with_transaction = sessions_by_shopping_stage(view, start_date, end_date, 'TRANSACTION')
df_sessions_with_transaction.head()
date userType sessions
0 2022-01-23 New Visitor 22
1 2022-01-23 Returning Visitor 14

Sessions with no shopping activity

df_sessions_with_no_shopping_activity = sessions_by_shopping_stage(view, start_date, end_date, 'NO_SHOPPING_ACTIVITY')
df_sessions_with_no_shopping_activity.head()
date userType sessions
0 2022-01-23 New Visitor 61
1 2022-01-23 Returning Visitor 31

Sessions with no cart addition

df_sessions_with_no_cart_addition = sessions_by_shopping_stage(view, start_date, end_date, 'NO_CART_ADDITION')
df_sessions_with_no_cart_addition.head()
date userType sessions
0 2022-01-23 New Visitor 117
1 2022-01-23 Returning Visitor 37

Sessions with cart abandonment

df_sessions_with_cart_abandonment = sessions_by_shopping_stage(view, start_date, end_date, 'CART_ABANDONMENT')
df_sessions_with_cart_abandonment.head()
date userType sessions
0 2022-01-23 New Visitor 15
1 2022-01-23 Returning Visitor 13

Sessions with checkout abandonment

df_sessions_with_checkout_abandonment = sessions_by_shopping_stage(view, start_date, end_date, 'CHECKOUT_ABANDONMENT')
df_sessions_with_checkout_abandonment.head()
date userType sessions
0 2022-01-23 New Visitor 9
1 2022-01-23 Returning Visitor 1

Sessions by shopping stage

The other (more efficient) approach to querying shopping stage data in the Google Analytics API is to fetch all the ga:shoppingStage values and then use the Pandas melt() or pivot() functions to re-shape the data to create a wide format dataframe with each value in its own column.

This approach is shown below and is essential if you wish to calculate cart or checkout abandonment rates or completion rates, since these require would otherwise require two queries to the Google Analytics API using the earlier method for extracting single values.

start_date = '2021-12-01'
end_date = '2022-01-23'
payload = {
    'start_date': start_date,
    'end_date': end_date,
    'metrics': 'ga:sessions',
    'dimensions': 'ga:date,ga:shoppingStage'
}

df = gp.run_query(service, view, payload)
df['sessions'] = df['sessions'].astype(int)
df.head()
date shoppingStage sessions
0 2021-12-01 ADD_TO_CART 32
1 2021-12-01 ADD_TO_CART_WITHOUT_VIEW 1
2 2021-12-01 ADD_TO_CART_WITH_VIEW 31
3 2021-12-01 ALL_VISITS 287
4 2021-12-01 CART_ABANDONMENT 4

The easiest way to use Pandas to re-shape the data is via the pivot() function. Using pivot() we can convert the long form dataframe into a wide format dataframe. To do that we’ll set the date column as the index, set the shoppingStage as the column names, and put the sessions metric in for the values. We need to use fillna(0) to turn NaN values into zeroes and then reset the index using reset_index().

df = df.pivot(index='date', columns='shoppingStage', values='sessions').fillna(0).reset_index()
df.head()
shoppingStage date ADD_TO_CART ADD_TO_CART_WITHOUT_VIEW ADD_TO_CART_WITH_VIEW ALL_VISITS CART_ABANDONMENT CHECKOUT CHECKOUT_1 CHECKOUT_1_ABANDONMENT CHECKOUT_2 CHECKOUT_2_ABANDONMENT CHECKOUT_2_WITHOUT_CHECKOUT_1 CHECKOUT_ABANDONMENT CHECKOUT_WITHOUT_CART_ADDITION CHECKOUT_WITH_CART_ADDITION NO_CART_ADDITION NO_PRODUCT_VIEW NO_SHOPPING_ACTIVITY PRODUCT_VIEW TRANSACTION TRANSACTION_WITHOUT_CHECKOUT
0 2021-12-01 32.0 1.0 31.0 287.0 4.0 25.0 25.0 4.0 20.0 3.0 0.0 7.0 4.0 21.0 137.0 5.0 113.0 169.0 26.0 8.0
1 2021-12-02 36.0 0.0 36.0 325.0 10.0 24.0 24.0 5.0 19.0 2.0 0.0 7.0 5.0 19.0 146.0 3.0 135.0 187.0 27.0 10.0
2 2021-12-03 42.0 0.0 42.0 238.0 11.0 25.0 25.0 3.0 21.0 2.0 0.0 5.0 2.0 23.0 106.0 1.0 88.0 149.0 28.0 8.0
3 2021-12-04 24.0 0.0 24.0 215.0 9.0 13.0 13.0 2.0 10.0 1.0 0.0 3.0 3.0 10.0 96.0 2.0 92.0 121.0 15.0 5.0
4 2021-12-05 67.0 0.0 67.0 342.0 21.0 42.0 42.0 7.0 31.0 3.0 0.0 10.0 4.0 38.0 146.0 4.0 124.0 214.0 41.0 9.0

Checkout completion metrics

To specifically extract only the completion metrics we can filter the Pandas dataframe so it shows only specific columns with values that match the shopping behaviour reports. You can do this by passing a list of column names from the dataframe that correspond to the values that were stored in the ga:shoppingStage dimension when the original data were exported.

df_shopping_behaviour = df[['date', 'ALL_VISITS', 'PRODUCT_VIEW', 
                            'ADD_TO_CART', 'CHECKOUT', 'TRANSACTION']]
df_shopping_behaviour.head()
shoppingStage date ALL_VISITS PRODUCT_VIEW ADD_TO_CART CHECKOUT TRANSACTION
0 2021-12-01 287.0 169.0 32.0 25.0 26.0
1 2021-12-02 325.0 187.0 36.0 24.0 27.0
2 2021-12-03 238.0 149.0 42.0 25.0 28.0
3 2021-12-04 215.0 121.0 24.0 13.0 15.0
4 2021-12-05 342.0 214.0 67.0 42.0 41.0

Checkout completion rates

To calculate completion rates, such as sessions with product views, sessions with add to cart, sessions with checkout, and sessions with transactions, we need to use two values from the dimension and then divide them accordingly to calculate the rate, then multiply that by 100 to express it as a percentage.

df_completion_rates = df[['date', 'ALL_VISITS', 'PRODUCT_VIEW', 
                          'ADD_TO_CART', 'CHECKOUT', 'TRANSACTION']]
df_completion_rates = df_completion_rates.assign(\
    sessions_with_product_views = ((df_completion_rates['PRODUCT_VIEW'] / df_completion_rates['ALL_VISITS'])*100))
df_completion_rates = df_completion_rates.assign(\
    sessions_with_add_to_cart = ((df_completion_rates['ADD_TO_CART'] / df_completion_rates['ALL_VISITS'])*100))
df_completion_rates = df_completion_rates.assign(\
    sessions_with_checkout = ((df_completion_rates['CHECKOUT'] / df_completion_rates['ALL_VISITS'])*100))
df_completion_rates = df_completion_rates.assign(\
    sessions_with_transactions = ((df_completion_rates['TRANSACTION'] / df_completion_rates['ALL_VISITS'])*100))
df_completion_rates = df_completion_rates[['date', 'sessions_with_product_views', 'sessions_with_add_to_cart', 
                                           'sessions_with_checkout', 'sessions_with_transactions']]
df_completion_rates.head()
shoppingStage date sessions_with_product_views sessions_with_add_to_cart sessions_with_checkout sessions_with_transactions
0 2021-12-01 58.885017 11.149826 8.710801 9.059233
1 2021-12-02 57.538462 11.076923 7.384615 8.307692
2 2021-12-03 62.605042 17.647059 10.504202 11.764706
3 2021-12-04 56.279070 11.162791 6.046512 6.976744
4 2021-12-05 62.573099 19.590643 12.280702 11.988304

Checkout abandonment metrics

You can use a similar approach to filter the Pandas dataframe to show only the abandonment metrics, simply by passing in a list of the metrics you want to keep.

df_abandonment_metrics = df[['date', 'ALL_VISITS', 'NO_SHOPPING_ACTIVITY',
                             'NO_CART_ADDITION', 'CART_ABANDONMENT', 'CHECKOUT_ABANDONMENT']]
df_abandonment_metrics.head()
shoppingStage date ALL_VISITS NO_SHOPPING_ACTIVITY NO_CART_ADDITION CART_ABANDONMENT CHECKOUT_ABANDONMENT
0 2021-12-01 287.0 113.0 137.0 4.0 7.0
1 2021-12-02 325.0 135.0 146.0 10.0 7.0
2 2021-12-03 238.0 88.0 106.0 11.0 5.0
3 2021-12-04 215.0 92.0 96.0 9.0 3.0
4 2021-12-05 342.0 124.0 146.0 21.0 10.0

Checkout abandonment rates

Finally, we can calculate the abandonment rates. Unlike the completion rates, which all use ALL_VISITS, the abandonment rates need to use specific metrics stored in ga:shoppingStage to ensure they show exactly what we need.

df_abandonment_rates = df.copy()
df_abandonment_rates = df_abandonment_rates.assign(\
    sessions_with_no_shipping_activity = ((df_abandonment_rates['NO_SHOPPING_ACTIVITY'] / df_abandonment_rates['ALL_VISITS'])*100))
df_abandonment_rates = df_abandonment_rates.assign(\
    sessions_with_no_cart_addition = ((df_abandonment_rates['NO_CART_ADDITION'] / df_abandonment_rates['PRODUCT_VIEW'])*100))
df_abandonment_rates = df_abandonment_rates.assign(\
    sessions_with_cart_abandonment = ((df_abandonment_rates['CART_ABANDONMENT'] / df_abandonment_rates['ADD_TO_CART'])*100))
df_abandonment_rates = df_abandonment_rates.assign(\
    sessions_with_checkout_abandonment = ((df_abandonment_rates['CHECKOUT_ABANDONMENT'] / df_abandonment_rates['CHECKOUT'])*100))
df_abandonment_rates = df_abandonment_rates[['date', 'sessions_with_no_shipping_activity', 'sessions_with_no_cart_addition', 
                                             'sessions_with_cart_abandonment', 'sessions_with_checkout_abandonment']]

And that’s it. You can now extract completion and abandonment data from the Google Analytics API, even though Google doesn’t let you do this by default. It’s fairly easy to change the payload to turn these data into a report that you can use to monitor the metrics over time to help your ecommerce team focus on improvement and keep an eye out for potential site issues.

df_abandonment_rates.head()
shoppingStage date sessions_with_no_shipping_activity sessions_with_no_cart_addition sessions_with_cart_abandonment sessions_with_checkout_abandonment
0 2021-12-01 39.372822 81.065089 12.500000 28.000000
1 2021-12-02 41.538462 78.074866 27.777778 29.166667
2 2021-12-03 36.974790 71.140940 26.190476 20.000000
3 2021-12-04 42.790698 79.338843 37.500000 23.076923
4 2021-12-05 36.257310 68.224299 31.343284 23.809524

Matt Clarke, Tuesday, January 25, 2022

Matt Clarke Matt is an Ecommerce and Marketing Director who uses data science to help in his work. Matt has a Master's degree in Internet Retailing (plus two other Master's degrees in different fields) and specialises in the technical side of ecommerce and marketing.