How to query the Google Analytics Data API for GA4 using Python

Learn how to query the Google Analytics Data API for GA4 using Python with GAPandas4 to fetch your GA data and display it in a Pandas dataframe.

How to query the Google Analytics Data API for GA4 using Python
Picture by Josh Sorenson, Pexels.
19 minutes to read

Google recently announced that it will be sunsetting Universal Analytics and replacing it with Google Analytics 4. The news sent shock waves through the ecommerce and marketing world, as it meant we’ll lose all our previous data, and we’ll have to migrate to GA4 and start using it.

For data scientists like me, who’ve spent weeks building reporting systems and models that pull data from the Google Analytics Reporting API, this is also somewhat irksome. I wrote the GAPandas Python package to make it quicker and easier for me to query the GA API and use it heavily in my work.

By June 2023, GAPandas and other code that queries the Google Analytics Reporting API used by Universal Analytics will cease to work or have access to fresh data, as Universal Analytics is being killed off or sunsetted.

Seeing as I’ve now got lots of reports to rewrite, I figured it wise to create a new version of GAPandas that supports the Google Analytics Data API used by GA4. At the moment it’s very alpha, but does work and is a quick and easy way to access GA4 data in Python. Here’s how to use it.

Install the package

GAPandas4 will work in Jupyter, Google Colab, or a Python script. First, you’ll need to install the Python package from my GitHub repository. You can do that in Jupyter or Colab using the command below. This will install the right version of Pandas, plus the Google Analytics Data package that GAPandas4 forms a helpful wrapper around. Once installed, import the package and give it an alias.

!pip3 install git+https://github.com/practical-data-science/gapandas4.git
import gapandas4 as gp

Configure your settings

Next, create a couple of variables in which to store your settings. The service_account variable needs to hold the filename and path to your Google Service Account client secrets JSON keyfile. The Service Account needs to be granted access to the Google Analytics Data API and you’ll need to add the email of the service account user as a user on your Google Analytics 4 property. You’ll also need the number of the Google Analytics 4 property you wish to query.

service_account = 'client_secrets.json'
property_id = '123456789'

Create your request

Unlike the previous version, the GA4 API uses Protocol Buffers or Protobufs to accept requests. These are a bit clunkier to create and have a very specific syntax. You’ll need to pass in the property_id with every request, plus the dimensions and metrics you want to return and the date_ranges.

Once you’ve constructed your request we’ll pass this to query() with the service_account and GAPandas4 will run the query and return the data in a Pandas dataframe, or a list of Pandas dataframes if you passed a batch report request.

request = gp.RunReportRequest(
    property=f"properties/{property_id}",
    dimensions=[
        gp.Dimension(name="country"),
        gp.Dimension(name="city")
    ],
    metrics=[
        gp.Metric(name="activeUsers")
    ],
    date_ranges=[gp.DateRange(start_date="2022-06-01", end_date="2022-06-01")],
)

df = gp.query(service_account, request)
df.head()
country city activeUsers
0 United Kingdom London 2191
1 United Kingdom (not set) 805
2 United Kingdom Birmingham 202
3 United Kingdom Manchester 105
4 United Kingdom Leeds 95

Full code example

Here’s the full code example required to construct a simple GA4 API query and return the output in a Pandas dataframe. This should be all you need to get started with regular reports. You can also use GAPandas4 to run batch reports, pivot reports, batch pivot reports, and realtime reports.

import gapandas4 as gp

service_account = 'client_secrets.json'
property_id = '123456789'

request = gp.RunReportRequest(
    property=f"properties/{property_id}",
    dimensions=[
        gp.Dimension(name="country"),
        gp.Dimension(name="city")
    ],
    metrics=[
        gp.Metric(name="activeUsers")
    ],
    date_ranges=[gp.DateRange(start_date="2022-06-01", end_date="2022-06-01")],
)

df = gp.query(service_account, request)
df.head()
country city activeUsers
0 United Kingdom London 2191
1 United Kingdom (not set) 805
2 United Kingdom Birmingham 202
3 United Kingdom Manchester 105
4 United Kingdom Leeds 95

Dimension, Metric, and DateRange example

request = gp.RunReportRequest(
    property=f"properties/{property_id}",
    dimensions=[
        gp.Dimension(name="country"),
        gp.Dimension(name="city")
    ],
    metrics=[
        gp.Metric(name="activeUsers")
    ],
    date_ranges=[gp.DateRange(start_date="2022-06-01", end_date="2022-06-01")],
)

df = gp.query(service_account, request)
print(df.head())

Dimension, Metric, DateRange, and OrderBy metric example

request = gp.RunReportRequest(
    property=f"properties/{property_id}",
    dimensions=[
        gp.Dimension(name="country"),
        gp.Dimension(name="city")
    ],
    metrics=[
        gp.Metric(name="activeUsers")
    ],
    date_ranges=[gp.DateRange(start_date="2022-06-01", end_date="2022-06-01")],
    order_bys=[gp.OrderBy(metric=gp.OrderBy.MetricOrderBy(metric_name='activeUsers'), desc=True)]
)

df = gp.query(service_account, request)
print(df.head())

Dimension, Metric, DateRange, and OrderBy dimension example

request = gp.RunReportRequest(
    property=f"properties/{property_id}",
    dimensions=[
        gp.Dimension(name="country"),
        gp.Dimension(name="city")
    ],
    metrics=[
        gp.Metric(name="activeUsers")
    ],
    date_ranges=[gp.DateRange(start_date="2022-06-01", end_date="2022-06-01")],
    order_bys=[gp.OrderBy(dimension=gp.OrderBy.DimensionOrderBy(dimension_name='country'), desc=False)]
)

df = gp.query(service_account, request)
print(df.head())

Dimension, Metric, DateRange, and include Filter example

request = gp.RunReportRequest(
    property=f"properties/{property_id}",
    dimensions=[
        gp.Dimension(name="country"),
        gp.Dimension(name="city")
    ],
    metrics=[
        gp.Metric(name="activeUsers")
    ],
    date_ranges=[gp.DateRange(start_date="2022-06-01", end_date="2022-06-01")],
    dimension_filter=gp.FilterExpression(
        filter=gp.Filter(
            field_name="country",
            string_filter=gp.Filter.StringFilter(value="Australia"),
            )
        ),
)

df = gp.query(service_account, request)
print(df.head())

Dimension, Metric, DateRange, and exclude Filter example

request = gp.RunReportRequest(
    property=f"properties/{property_id}",
    dimensions=[
        gp.Dimension(name="country"),
        gp.Dimension(name="city")
    ],
    metrics=[
        gp.Metric(name="activeUsers")
    ],
    date_ranges=[gp.DateRange(start_date="2022-06-01", end_date="2022-06-01")],
    dimension_filter=gp.FilterExpression(
        not_expression=gp.FilterExpression(
            filter=gp.Filter(
                field_name="country",
                string_filter=gp.Filter.StringFilter(value="United Kingdom"),
            )
        )
    ),
)

df = gp.query(service_account, request)
print(df.head())

Dimension, Metric, DateRange, and in list Filter example

request = gp.RunReportRequest(
    property=f"properties/{property_id}",
    dimensions=[
        gp.Dimension(name="country"),
        gp.Dimension(name="city")
    ],
    metrics=[
        gp.Metric(name="activeUsers")
    ],
    date_ranges=[gp.DateRange(start_date="2022-06-01", end_date="2022-06-01")],
    dimension_filter=gp.FilterExpression(
        filter=gp.Filter(
            field_name="country",
            in_list_filter=gp.Filter.InListFilter(
                values=[
                     "United Kingdom",
                    "Ireland",
                ]
            ),
        )
    ),
)

df = gp.query(service_account, request)
print(df.head())

Dimension, Metric, DateRange, and not in list Filter example

request = gp.RunReportRequest(
    property=f"properties/{property_id}",
    dimensions=[
        gp.Dimension(name="country"),
        gp.Dimension(name="city")
    ],
    metrics=[
        gp.Metric(name="activeUsers")
    ],
    date_ranges=[gp.DateRange(start_date="2022-06-01", end_date="2022-06-01")],
    dimension_filter=gp.FilterExpression(
        not_expression=gp.FilterExpression(
            filter=gp.Filter(
                field_name="country",
                in_list_filter=gp.Filter.InListFilter(
                    values=[
                        "United Kingdom",
                        "Ireland",
                    ]
                ),
            )
        )
    ),
)

df = gp.query(service_account, request)
print(df.head())

Batch reports

The Google Analytics Data API supports making batch requests where you can pass up to five API requests at once. GAPandas4 will process each one and return the Pandas dataframes in a Python list so you can access them.

import gapandas4 as gp

service_account = 'client_secrets.json'
property_id = 'xxxxxxxxx'


batch_report_request = gp.BatchRunReportsRequest(
    property=f"properties/{property_id}",
    requests=[
        gp.RunReportRequest(
            dimensions=[
                gp.Dimension(name="country"),
                gp.Dimension(name="city")
            ],
            metrics=[
                gp.Metric(name="activeUsers")
            ],
            date_ranges=[gp.DateRange(start_date="2022-06-01", end_date="2022-06-01")]
        ),
        gp.RunReportRequest(
            dimensions=[
                gp.Dimension(name="country"),
                gp.Dimension(name="city")
            ],
            metrics=[
                gp.Metric(name="activeUsers")
            ],
            date_ranges=[gp.DateRange(start_date="2022-06-02", end_date="2022-06-02")]
        )
    ]
)

df = gp.query(service_account, batch_report_request, report_type="batch_report")
print(df[0].head())
print(df[1].head())

Pivot report

Constructing a report using RunPivotReportRequest() will return pivoted data in a single Pandas dataframe.

import gapandas4 as gp

service_account = 'client_secrets.json'
property_id = 'xxxxxxxxx'

pivot_request = gp.RunPivotReportRequest(
    property=f"properties/{property_id}",
    dimensions=[gp.Dimension(name="country"),
                gp.Dimension(name="browser")],
    metrics=[gp.Metric(name="sessions")],
    date_ranges=[gp.DateRange(start_date="2022-05-30", end_date="today")],
    pivots=[
        gp.Pivot(
            field_names=["country"],
            limit=5,
            order_bys=[
                gp.OrderBy(
                    dimension=gp.OrderBy.DimensionOrderBy(dimension_name="country")
                )
            ],
        ),
        gp.Pivot(
            field_names=["browser"],
            offset=0,
            limit=5,
            order_bys=[
                gp.OrderBy(
                    metric=gp.OrderBy.MetricOrderBy(metric_name="sessions"), desc=True
                )
            ],
        ),
    ],
)

df = gp.query(service_account, pivot_request, report_type="pivot")
print(df.head())

Batch pivot report

Constructing a payload using BatchRunPivotReportsRequest() will allow you to run up to five pivot reports. These are returned as a list of Pandas dataframes.

import gapandas4 as gp

service_account = 'client_secrets.json'
property_id = 'xxxxxxxxx'

batch_pivot_request = gp.BatchRunPivotReportsRequest(
    property=f"properties/{property_id}",
    requests=[
        gp.RunPivotReportRequest(
            dimensions=[gp.Dimension(name="country"),
                        gp.Dimension(name="browser")],
                metrics=[gp.Metric(name="sessions")],
                date_ranges=[gp.DateRange(start_date="2022-05-30", end_date="today")],
                pivots=[
                    gp.Pivot(
                        field_names=["country"],
                        limit=5,
                        order_bys=[
                            gp.OrderBy(
                                dimension=gp.OrderBy.DimensionOrderBy(dimension_name="country")
                            )
                        ],
                    ),
                    gp.Pivot(
                        field_names=["browser"],
                        offset=0,
                        limit=5,
                        order_bys=[
                            gp.OrderBy(
                                metric=gp.OrderBy.MetricOrderBy(metric_name="sessions"), desc=True
                            )
                        ],
                    ),
                ],
        ),
        gp.RunPivotReportRequest(
            dimensions=[gp.Dimension(name="country"),
                        gp.Dimension(name="browser")],
                metrics=[gp.Metric(name="sessions")],
                date_ranges=[gp.DateRange(start_date="2022-05-30", end_date="today")],
                pivots=[
                    gp.Pivot(
                        field_names=["country"],
                        limit=5,
                        order_bys=[
                            gp.OrderBy(
                                dimension=gp.OrderBy.DimensionOrderBy(dimension_name="country")
                            )
                        ],
                    ),
                    gp.Pivot(
                        field_names=["browser"],
                        offset=0,
                        limit=5,
                        order_bys=[
                            gp.OrderBy(
                                metric=gp.OrderBy.MetricOrderBy(metric_name="sessions"), desc=True
                            )
                        ],
                    ),
                ],
        )
    ]
)

df = gp.query(service_account, batch_pivot_request, report_type="batch_pivot")
print(df[0].head())
print(df[1].head())

Matt Clarke, Wednesday, June 22, 2022

Matt Clarke Matt is an Ecommerce and Marketing Director who uses data science to help in his work. Matt has a Master's degree in Internet Retailing (plus two other Master's degrees in different fields) and specialises in the technical side of ecommerce and marketing.