How to audit a site's Core Web Vitals using Python

Core Web Vitals are performance metrics that measure the quality of the user experience and are now a search ranking factor. Here’s how to access them.

How to audit a site's Core Web Vitals using Python
Picture by Jeremy Julian, Unsplash.
15 minutes to read

Back in 2020, Google introduced Web Vitals, a set of metrics is designed to help site owners to optimise the user experience on their websites, so pages are quick to render, load, and interact with, whatever the device being used.

To encourage site owners to improve their sites, and make the overall internet experience better for us all, Google is soon to make a subset of these metrics a ranking factor. This means it will be even more important for those working technical SEO to ensure their sites are conforming in order to maintain or improve their search engine rankings.

What are the Core Web Vitals?

The Core Web Vitals are the subset of the Web Vitals metrics deemed to be most critical by Google. At present, there are only a few Web Vitals that count as “Core” metrics, and they cover loading, interactivity, and visual stability. However, I suspect this is likely to change with time.

Crucially, all of these metrics are aimed at making the user experience better, so even if they weren’t a ranking factor, they’d still be worth optimising to ensure your site worked as well as it could. It recommends aiming above the 75% mark for all of them.

Core Web Vitals

The “Core” metrics in Web Vitals at present include Largest Contentful Paint (LCP), First Input Delay (FID), and Cumulative Layout Shift (CLS):

Largest Contentful Paint (LCP) LCP measures the perceived load time of a page, and is based on the page load timeline when the page's main content should have loaded. It tells you how long it takes for a page to finish rendering.
First Input Delay (FID) FID is a measure of the page's responsiveness, and quantifies the user experience a web site visitor has when they try to interact with the page. Until all of your JavaScript has loaded, visitors won't be able to click anything, so this is what FID measures.
Cumulative Layout Shift (CLS) CLS is a measure of visual stability and measures the amount of unexpected shift in the page layout. If your page jumps around during the page load because image sizes aren't defined, then you'll have a poor CLS score.
Other important web vitals

Alongside the Core Web Vitals, there are also a number of other important Web Vitals you need to monitor. I suspect, these too may become ranking factors in the years to come.

First Contentful Paint (FCP) The First Contentful Paint or FCP is the point at which the Document Object Model (or DOM) first starts filling in the page content, so the user knows something is about to happen.
Time to Interactive (TTI) Time to Interactive is another metric aiming to tell you how long it takes before the page will respond to use interaction or input, and everything has fully loaded.
Total Blocking Time (TBT) Total Blocking Time is another page responsiveness metric and measures the time between First Contentful Paint and Time to Interactive. During this period, the user is blocked from interacting with the page as it hasn't completely loaded.

The contribution to the overall score depends on the weighting for each metric, and some are more important than others. You’ll maximise your dev team’s ROI by focusing on the important ones first, and then do the fine-tuning in your subsequent site releases.

In this project, we’ll use Python to calculate the Core Web Vitals (and some other useful metrics) for a range of pages on a website across devices using the Google Core Web Vitals API. By running this report every week or month, you’ll be able to see the change in this important KPI.

Load the packages

Open a Jupyter notebook and import the json, requests, pandas, and urllib packages. Most of these are pre-bundled with Python, and you’ll likely already have Pandas.

import json
import requests
import pandas as pd
import urllib

Create an API key

While you do not need an API key to access the Core Web Vitals API, I would highly recommend that you get one. Without an API key you will meet ridiculous levels of throttling and nearly every report you run is likely to return “HTTPError: HTTP Error 429: Too Many Requests” for exceeding the quota. You can generate an API key on the Google Developers site.

key = "XXXXXXXXXXXXXXXXXXXXXXXXXXXX"

Query the Core Web Vitals API

Next, we’ll create a simple function to format our query and pass it to the Core Web Vitals API. This takes the URL, the API key, and the strategy (i.e. desktop or mobile), and returns a JSON object containing the API response. If you’re using this in production, rather than in a Jupyter notebook, you’ll need some error handling code in here, as the API isn’t 100% reliable.

def query(url, key, strategy="desktop"):

    endpoint = "https://www.googleapis.com/pagespeedonline/v5/runPagespeed"\
    +"?strategy="+strategy\
    +"&url={}"\
    +"&key="+key

    response = urllib.request.urlopen(endpoint.format(url)).read().decode('UTF-8')
    data = json.loads(response)  

    return data
report = query("http://flyandlure.org", key, strategy="mobile")

Save the report file

The Core Web Vitals API returns an enormous quantity of data in its JSON output, so you may want to save this, either to keep it for historic analysis, or so you can analyse it in a JSON file viewer that supports tree view and node collapsing. The report is really easy to save using open() and json.dump().

def save_report(report, filename):
    """Save the Core Web Vitals JSON report to file. 

    Args:
        report (object): JSON object containing report data. 
        filename (string): Filename to use for report.

    Returns:
        JSON Core Web Vitals report file. 
    """

    with open(filename, 'w') as outfile:
        json.dump(report, outfile)
save_report(report, 'core_web_vitals.json')

Parse the Core Web Vitals data

The Core Web Vitals API JSON includes far more information than the average person is going to use, so we can afford to be a little selective and extract only the important elements from the data for our report. Since we’re going to be storing data from multiple pages, at different times, and for different form factors (i.e. desktop, mobile), we need to grab these items first.

final_url = report['lighthouseResult']['finalUrl']
fetch_time = report['lighthouseResult']['fetchTime']
form_factor = report['lighthouseResult']['configSettings']['formFactor']

Next, we’ll extract the main metrics from the Core Web Vitals data. There are tons of additional ones, but these are the ones that matter most from an SEO perspective. They comprise: the overall score, the speed index, the first meaningful paint, the first contentful paint, the interactive score, the total blocking time score, and the cumulative layout shift score.

overall_score = report["lighthouseResult"]["categories"]["performance"]["score"] * 100
speed_index = report["lighthouseResult"]["audits"]["speed-index"]["score"] * 100
first_meaningful_paint = report["lighthouseResult"]["audits"]["first-meaningful-paint"]["score"] * 100
first_contentful_paint = report["lighthouseResult"]["audits"]["first-contentful-paint"]["score"] * 100
time_to_interactive = report["lighthouseResult"]["audits"]["interactive"]["score"] * 100
total_blocking_time = report["lighthouseResult"]["audits"]["total-blocking-time"]["score"] * 100
cumulative_layout_shift = report["lighthouseResult"]["audits"]["cumulative-layout-shift"]["score"] * 100

Create a function to return the Core Web Vitals

To tidy this up, we can create a reusable function that takes our JSON report from the Core Web Vitals API, parses out the key features we want to capture, and adds them to a dictionary, so we can access and store the data more easily.

def get_core_web_vitals(report):
    """Return a dictionary containing the Core Web Vitals from the report. 

    Args:
        report (object): JSON object containing report data. 

    Return:
        data (dict): Dictionary containing the key data. 

    """

    final_url = report['lighthouseResult']['finalUrl']
    fetch_time = report['lighthouseResult']['fetchTime']
    form_factor = report['lighthouseResult']['configSettings']['formFactor']
    overall_score = report["lighthouseResult"]["categories"]["performance"]["score"] * 100
    speed_index = report["lighthouseResult"]["audits"]["speed-index"]["score"] * 100
    first_meaningful_paint = report["lighthouseResult"]["audits"]["first-meaningful-paint"]["score"] * 100
    first_contentful_paint = report["lighthouseResult"]["audits"]["first-contentful-paint"]["score"] * 100
    time_to_interactive = report["lighthouseResult"]["audits"]["interactive"]["score"] * 100
    total_blocking_time = report["lighthouseResult"]["audits"]["total-blocking-time"]["score"] * 100
    cumulative_layout_shift = report["lighthouseResult"]["audits"]["cumulative-layout-shift"]["score"] * 100

    data = {
        'final_url': final_url,
        'fetch_time': fetch_time,
        'form_factor': form_factor,
        'overall_score': overall_score,
        'speed_index': speed_index,    
        'first_meaningful_paint': first_meaningful_paint,
        'first_contentful_paint': first_contentful_paint,
        'time_to_interactive': time_to_interactive,
        'total_blocking_time': total_blocking_time,
        'cumulative_layout_shift': cumulative_layout_shift,
    }

    return data
data = get_core_web_vitals(report)
data
{'final_url': 'http://flyandlure.org/',
 'fetch_time': '2021-01-24T13:40:56.277Z',
 'form_factor': 'mobile',
 'overall_score': 36.0,
 'speed_index': 82.0,
 'first_meaningful_paint': 51.0,
 'first_contentful_paint': 51.0,
 'time_to_interactive': 35.0,
 'total_blocking_time': 25.0,
 'cumulative_layout_shift': 33.0}

Create a list of representative URLs to check

The Core Web Vitals are going to vary according to the strategy you use (i.e. desktop or mobile), and the page type examined, as the underlying content and code structure is likely to be different.

Therefore, to get a better idea of the performance of your site across the key areas, it’s a good idea to create a list of pages of various types, so you can analyse them regularly and see where any problems lie.

df_urls = pd.read_csv('urls.csv')
df_urls.head(6)
url strategy page_type
0 https://www.ebuyer.com/ desktop general
1 https://www.ebuyer.com/ mobile general
2 https://www.ebuyer.com/989809-crucial-32gb-1x-... desktop product
3 https://www.ebuyer.com/989809-crucial-32gb-1x-... mobile general

Fetch and store the data for each URL

Finally, you can iterate over the rows in your Pandas dataframe of URLs, pass each URL to the API, parse the response, extract the metrics on the Web Vitals, and store the output in a dataframe.

df = pd.DataFrame(columns = ['final_url', 'fetch_time', 'form_factor', 'overall_score',
                            'speed_index', 'first_meaningful_paint', 'first_contentful_paint',
                            'time_to_interactive', 'total_blocking_time', 'cumulative_layout_shift'])

for index, row in df_urls.iterrows():
    report = query(row['url'], key, strategy=row['strategy'])

    data = get_core_web_vitals(report)
    data['page_type'] = row['page_type']

    df = df.append(data, ignore_index=True)
df.head(2).T
0 1
final_url https://www.ebuyer.com/ https://www.ebuyer.com/
fetch_time 2021-01-24T13:41:08.240Z 2021-01-24T13:41:17.061Z
form_factor desktop mobile
overall_score 94 72
speed_index 100 100
first_meaningful_paint 99 91
first_contentful_paint 100 99
time_to_interactive 100 39
total_blocking_time 100 96
cumulative_layout_shift 92 81
page_type general general

Repeat this process on a good selection of URLs, spanning every page type on your site, and you should be able to build up a picture of site performance and measure your improvements over time.

Matt Clarke, Friday, March 12, 2021

Matt Clarke Matt is a Digital Director who uses data science to help in his work. He has a Master's degree in Internet Retailing (plus two other Master's degrees in different fields) and specialises in the technical side of ecommerce and marketing.

Joining Data with pandas

Learn to combine data from multiple tables by joining data together using pandas.

Start course for FREE

Comments