To extract specific data from the Google Analytics API you will often need to use segments and filters to ensure you get the data you want. For example, you might want to find all the data on visitors with the userType
of New Visitor
, so you’d need to pass this to your Google Analytics API query using an operator.
Operators are basically a way of specifying how a given filter
or segment
should be extracted from your GA data. For example, you can use the ==
operator to find all the data on visitors with the userType
of New Visitor
, or you could use the !=
operator to find all the data on visitors who are not New Visitor
.
In this post we’ll look at the different operators and how to use them in your Google Analytics API queries. We’ll be using my GAPandas Python package designed to make it quick and easy to query the Google Analytics API using Python and Pandas.
The Google Analytics API includes query operators to handle: equals; does not equal; greater than; greater than or equal to, and less than or equal to. In addition, you can also create operators that identify whether a string contains a substring or not, or matches a regular expression (or regex) or not.
Since Google Analytics metrics contain numeric data and Google Analytics dimensions contain categorical data, you can only use certain operators on certain types of data. These are known as valid combinations. If you attempt to use an invalid combination, you’ll get an error.
Here’s a summary of the available operators and the data types they will work with.
Operator | Description | Works with |
---|---|---|
== | Equals | Metrics and dimensions |
!= | Does not equal | Metrics and dimensions |
> | Greater than | Metrics only |
< | Less than | Metrics only |
>= | Greater than or equal to | Metrics only |
<= | Less than or equal to | Metrics only |
=@ | Contains substring | Dimensions only |
!@ | Does not contain substring | Dimensions only |
=~ | Matches regex | Dimensions only |
!~ | Does not match regex | Dimensions only |
You can apply Google Analytics operators to two places in an API query: the filters
and the segment
. As the name suggests, filters
are used to filter your data and are the most basic of the two.
To use an operator with a Google Analytics filter using the API you need to call the metric or dimension with its ga:
prefix followed by your operator and the value. For example, the filter ga:country==United Kingdom
will return only sessions were the ga:country
dimension was set to United Kingdom
.
When passing operators to the API you ordinarily need to URL encode them, so ==
would become %3D%3D
. However, GAPandas will automatically handle the URL encoding for you so you can just enter them in their unencoded form.
In the simple example below we’ll use the equals operator ==
to select all data from the API where the ga:country
dimension is equal to United Kingdom
. GAPandas will fetch the data from your Google Analytics account and return it in a neatly formatted Pandas dataframe.
import gapandas as gp
service = gp.get_service('client_secrets.json')
view = '1234567'
payload = {
'start_date': '30daysAgo',
'end_date': 'today',
'metrics': 'ga:sessions',
'dimensions': 'ga:date, ga:country, ga:userType',
'filters': 'ga:country==United Kingdom'
}
df = gp.run_query(service, view, payload)
df.head()
date | country | userType | sessions | |
---|---|---|---|---|
0 | 2021-12-26 | United Kingdom | New Visitor | 2627 |
1 | 2021-12-26 | United Kingdom | Returning Visitor | 3177 |
2 | 2021-12-27 | United Kingdom | New Visitor | 3467 |
3 | 2021-12-27 | United Kingdom | Returning Visitor | 3331 |
4 | 2021-12-28 | United Kingdom | New Visitor | 3562 |
If you have a more complex filter you want to run on your data, such as all the sessions from the United Kingdom who were using a mobile device running iOS, you can chain multiple filters together.
There are two main ways to chain operators: OR
and AND
. If you’re passing operators and want them to be considered with an OR
operator you need to separate each filter query with a comma. For example, ga:county==United Kingdom,ga:userType==United States
will return all sessions from either United Kingdom
or United States
.
To apply an AND
operator to multiple filters you need to separate the values with a semicolon. For example, ga:county==United Kingdom;ga:country==United States
will return all sessions from both United Kingdom
and United States
.
payload = {
'start_date': '30daysAgo',
'end_date': 'today',
'metrics': 'ga:sessions',
'dimensions': 'ga:date, ga:country, ga:userType',
'filters': 'ga:country==United Kingdom;ga:country==United States'
}
df = gp.run_query(service, view, payload)
df.head()
date | country | userType | sessions | |
---|---|---|---|---|
0 | 2021-12-26 | United Kingdom | New Visitor | 2627 |
1 | 2021-12-26 | United Kingdom | Returning Visitor | 3177 |
2 | 2021-12-26 | United States | New Visitor | 264 |
3 | 2021-12-26 | United States | Returning Visitor | 41 |
4 | 2021-12-27 | United Kingdom | New Visitor | 3467 |
The other way to select specific data from your Google Analytics account via the API is using segment
. The neat thing about segment
is that you can set them to extract data based on either the session
or the user
to group the data differently. You can also chain lots of them together to create very sophisticated queries.
In the below example we’ll create a segment that will extract all sessions with a sessionDuration
of greater than or equal to 90 seconds. If you wanted to do the same for users, you’d substitute sessions:condition::
for users::condition::
before you call your first dimension or metric.
payload = {
'start_date': '30daysAgo',
'end_date': 'today',
'metrics': 'ga:sessionDuration',
'dimensions': 'ga:date, ga:country, ga:userType',
'segment': 'sessions::condition::ga:sessionDuration>90'
}
df = gp.run_query(service, view, payload)
df.head()
date | country | userType | sessionDuration | |
---|---|---|---|---|
0 | 2021-12-26 | France | New Visitor | 3759.0 |
1 | 2021-12-26 | Guernsey | Returning Visitor | 104.0 |
2 | 2021-12-26 | Ireland | New Visitor | 463.0 |
3 | 2021-12-26 | Israel | Returning Visitor | 394.0 |
4 | 2021-12-26 | Jersey | New Visitor | 928.0 |
Next, we’ll combine a few different operators to create a more complex segment. sessions::condition::ga:sessionDuration>90;ga:country==Guernsey,ga:country==Jersey
will extract all sessions where the sessionDuration
is greater than or equal to 90 seconds and the country
is either Guernsey
or Jersey
.
Note that we used a semicolon ;
and operator between the two segments. This is because the ;
is a delimiter for the API and the ;
is not a valid character in a segment, while ,
or comma is the operator for OR, so it will select either `
payload = {
'start_date': '30daysAgo',
'end_date': 'today',
'metrics': 'ga:sessionDuration',
'dimensions': 'ga:date, ga:country, ga:userType',
'segment': 'sessions::condition::ga:sessionDuration>90;ga:country==Guernsey,ga:country==Jersey'
}
df = gp.run_query(service, view, payload)
df.head()
date | country | userType | sessionDuration | |
---|---|---|---|---|
0 | 2021-12-26 | Guernsey | Returning Visitor | 104.0 |
1 | 2021-12-26 | Jersey | New Visitor | 928.0 |
2 | 2021-12-26 | Jersey | Returning Visitor | 3674.0 |
3 | 2021-12-27 | Guernsey | Returning Visitor | 710.0 |
4 | 2021-12-27 | Jersey | New Visitor | 507.0 |
Matt Clarke, Tuesday, January 25, 2022