How to access the Google Knowledge Graph Search API

The Google Knowledge Graph powers the Knowledge Panels and infobox elements of Google’s search results. Here’s how you can access it using Python.

How to access the Google Knowledge Graph Search API
Picture by Andrea Piacquadio, Pexels.
9 minutes to read

The Google Knowledge Graph database includes an astronomical amount of data on almost every topic you can think of, allowing Google to create Knowledge Panels and infoboxes that summarise search results and connect the data to other sources.

In this project, we’ll use Python to connect to the Google Knowledge Graph API, fetch the data on your chosen subject, and reformat the JSON object returned into a neat Pandas dataframe that you can analyse or use in your projects.

Get an API key

First, you’ll need to go to the Google API Console, enable the Google Knowledge Graph Search API, and create an API key to use in your project. Then, open a Jupyter notebook and import the packages below.

import requests
import urllib
import json
import pandas as pd
from requests_html import HTML
from requests_html import HTMLSession

Get the source

We’ll create a few helper functions to handle each step in the process, which you can then re-use for other things if you like. First, we’ll take the URL and use Requests-HTML to send it to the server and return the source of the page returned, or an exception if it fails.

def get_source(url):
    """Return the source code for the provided URL. 

    Args: 
        url (string): URL of the page to scrape.

    Returns:
        response (object): HTTP response object from requests_html. 
    """

    try:
        session = HTMLSession()
        response = session.get(url)
        return response

    except requests.exceptions.RequestException as e:
        print(e)

Run the API query

Next, we’ll create a function called get_knowledge_graph() which takes the api_key and the query we want to search for. We’ll then assemble a payload dictionary called params and URL encode the content, then connect it to our endpoint, and run it through get_source(). Finally, we’ll reformat the response.text so it’s in JSON format.

def get_knowledge_graph(api_key, query):
    """Return a Google Knowledge Graph for a given query.

    Args: 
        api_key (string): Google Knowledge Graph API key. 
        query (string): Term to search for.

    Returns:
        response (object): Knowledge Graph response object in JSON format.
    """ 
        
    endpoint = 'https://kgsearch.googleapis.com/v1/entities:search'
    params = {
        'query': query,
        'limit': 10,
        'indent': True,
        'key': api_key,
    }

    url = endpoint + '?' + urllib.parse.urlencode(params)    
    response = get_source(url)
    
    return json.loads(response.text)

Let’s run the function above to see what the results look like. We’ll use the generic phrase tesla, as this could mean several things. The function returns a formatted JSON object containing a number of entities and their scores for the relevance to the term we used. The Tesla “Electric car company” comes out on top, among the various other Teslas.

api_key = "HJhs87hjahkjahjdh-Ajhsda7n87aa"
knowledge_graph_json = get_knowledge_graph(api_key, "tesla")
knowledge_graph_json
{'@context': {'resultScore': 'goog:resultScore',
  '@vocab': 'http://schema.org/',
  'kg': 'http://g.co/kg',
  'detailedDescription': 'goog:detailedDescription',
  'goog': 'http://schema.googleapis.com/',
  'EntitySearchResult': 'goog:EntitySearchResult'},
 '@type': 'ItemList',
 'itemListElement': [{'@type': 'EntitySearchResult',
   'resultScore': 3734.17626953125,
   'result': {'detailedDescription': {'articleBody': "Tesla, Inc. is an American electric vehicle and clean energy company based in Palo Alto, California. Tesla's current products include electric cars, battery energy storage from home to grid scale, solar panels and solar roof tiles, as well as other related products and services. ",
     'url': 'https://en.wikipedia.org/wiki/Tesla,_Inc.',
     'license': 'https://en.wikipedia.org/wiki/Wikipedia:Text_of_Creative_Commons_Attribution-ShareAlike_3.0_Unported_License'},
    'description': 'Electric car company',
    '@type': ['Organization', 'Corporation', 'Thing'],
    'name': 'Tesla, Inc.',
    '@id': 'kg:/m/0dr90d'}},
  {'result': {'description': 'Automobile make',
    '@type': ['Brand', 'Thing'],
    'name': 'Tesla',
    '@id': 'kg:/m/0j6n6s8'},
   '@type': 'EntitySearchResult',
   'resultScore': 1660.609008789062},
  {'resultScore': 733.95361328125,
   '@type': 'EntitySearchResult',
   'result': {'description': 'Rock band',
    '@type': ['MusicGroup', 'Thing'],
    '@id': 'kg:/m/036wfx',
    'name': 'Tesla',
    'detailedDescription': {'url': 'https://en.wikipedia.org/wiki/Tesla_(band)',
     'license': 'https://en.wikipedia.org/wiki/Wikipedia:Text_of_Creative_Commons_Attribution-ShareAlike_3.0_Unported_License',
     'articleBody': 'Tesla is an American rock band formed in Sacramento, California in late 1981 by bassist Brian Wheat and guitarist Frank Hannon. Lead vocalist Jeff Keith and drummer Troy Luccketta had joined them by 1984. They are the longest serving members and have appeared on all band\'s releases. Their current guitarist is Dave Rude, who replaced founding member Tommy Skeoch in 2006. Originally named City Kidd, the band changed from its glam-derived sound to a "rootsier" direction under a new name: Tesla. In 1996, the band disbanded, with members devoting themselves to solo projects. In 2000, they reformed, but Tommy Skeoch departed the band in 2006 and was replaced by Dave Rude. They have sold 14 million albums in the United States.'}}},
  {'resultScore': 702.0634155273438,
   '@type': 'EntitySearchResult',
   'result': {'detailedDescription': {'url': 'https://en.wikipedia.org/wiki/Nvidia_Tesla',
     'articleBody': "Nvidia Tesla was the name of Nvidia's line of products targeted at stream processing or general-purpose graphics processing units, named after pioneering electrical engineer Nikola Tesla. ",
     'license': 'https://en.wikipedia.org/wiki/Wikipedia:Text_of_Creative_Commons_Attribution-ShareAlike_3.0_Unported_License'},
    '@id': 'kg:/g/1pz2tx6nq',
    '@type': ['ProductModel', 'Thing'],
    'description': 'Video card',
    'name': 'Nvidia Tesla'}},
  {'@type': 'EntitySearchResult',
   'result': {'name': 'Tesla',
    '@id': 'kg:/m/03rhvb',
    'detailedDescription': {'articleBody': 'The tesla is a derived unit of the magnetic induction in the International System of Units.\nOne tesla is equal to one weber per square metre. ',
     'url': 'https://en.wikipedia.org/wiki/Tesla_(unit)',
     'license': 'https://en.wikipedia.org/wiki/Wikipedia:Text_of_Creative_Commons_Attribution-ShareAlike_3.0_Unported_License'},
    '@type': ['Thing'],
    'description': 'Unit profile'},
   'resultScore': 645.8308715820312},
  {'resultScore': 509.4024047851562,
   '@type': 'EntitySearchResult',
   'result': {'name': 'Ypohthonios',
    '@id': 'kg:/m/0nhyb1k',
    'description': 'Singer',
    '@type': ['Person', 'Thing']}},
  {'result': {'name': 'Tesla Powerwall',
    'detailedDescription': {'license': 'https://en.wikipedia.org/wiki/Wikipedia:Text_of_Creative_Commons_Attribution-ShareAlike_3.0_Unported_License',
     'url': 'https://en.wikipedia.org/wiki/Tesla_Powerwall',
     'articleBody': 'The Powerwall and Powerpack are rechargeable lithium-ion battery stationary energy storage products manufactured by Tesla, Inc. '},
    '@id': 'kg:/m/0134_cql',
    '@type': ['Thing']},
   '@type': 'EntitySearchResult',
   'resultScore': 393.5823974609375},
  {'resultScore': 363.5635986328125,
   '@type': 'EntitySearchResult',
   'result': {'detailedDescription': {'articleBody': 'A Tesla coil is an electrical resonant transformer circuit designed by inventor Nikola Tesla in 1891. It is used to produce high-voltage, low-current, high frequency alternating-current electricity. ',
     'license': 'https://en.wikipedia.org/wiki/Wikipedia:Text_of_Creative_Commons_Attribution-ShareAlike_3.0_Unported_License',
     'url': 'https://en.wikipedia.org/wiki/Tesla_coil'},
    'name': 'Tesla coil',
    '@type': ['Thing'],
    '@id': 'kg:/m/09rh1'}},
  {'result': {'@id': 'kg:/m/0fp_b55',
    'description': 'Automobile company',
    'name': 'Tesla Fremont Factory',
    'detailedDescription': {'url': 'https://en.wikipedia.org/wiki/Tesla_Fremont_Factory',
     'articleBody': "Tesla's Fremont Factory is an automobile manufacturing plant in Fremont, California, operated by Tesla, Inc. The facility opened as the General Motors Fremont Assembly in 1962, and was later operated by NUMMI, a former GM–Toyota joint venture. ",
     'license': 'https://en.wikipedia.org/wiki/Wikipedia:Text_of_Creative_Commons_Attribution-ShareAlike_3.0_Unported_License'},
    '@type': ['Organization', 'Corporation', 'Thing']},
   'resultScore': 126.9559783935547,
   '@type': 'EntitySearchResult'},
  {'resultScore': 119.4245147705078,
   '@type': 'EntitySearchResult',
   'result': {'@id': 'kg:/g/11c3j593xz',
    'name': 'Tesla Engineering',
    'url': 'http://www.tesla.co.uk',
    '@type': ['Organization', 'Thing']}}]}

Convert the Knowledge Graph to a Pandas dataframe

Although the JSON data is ideal for parsing and using in other Python code, it’s not particularly readable. However, we can use the really nifty Pandas function json_normalize() to turn this into a tidy looking dataframe. That’s all there is to it. We’ve now queried the API, fetched our data, and reformatted it so we can use it as we wish.

def knowledge_graph_to_df(knowledge_graph_json):
    """Return a Pandas dataframe containing a Google Knowledge Graph.

    Args: 
        knowledge_graph_json (string): Google Knowledge Graph JSON.

    Returns:
        dataframe (object): Knowledge Graph in Pandas dataframe.
    """     
    
    return pd.json_normalize(knowledge_graph_json, record_path='itemListElement')
df = knowledge_graph_to_df(knowledge_graph_json)
df.head()
@type resultScore result.detailedDescription.articleBody result.detailedDescription.url result.detailedDescription.license result.description result.@type result.name result.@id result.url
0 EntitySearchResult 3734.176270 Tesla, Inc. is an American electric vehicle an... https://en.wikipedia.org/wiki/Tesla,_Inc. https://en.wikipedia.org/wiki/Wikipedia:Text_o... Electric car company [Organization, Corporation, Thing] Tesla, Inc. kg:/m/0dr90d NaN
1 EntitySearchResult 1660.609009 NaN NaN NaN Automobile make [Brand, Thing] Tesla kg:/m/0j6n6s8 NaN
2 EntitySearchResult 733.953613 Tesla is an American rock band formed in Sacra... https://en.wikipedia.org/wiki/Tesla_(band) https://en.wikipedia.org/wiki/Wikipedia:Text_o... Rock band [MusicGroup, Thing] Tesla kg:/m/036wfx NaN
3 EntitySearchResult 702.063416 Nvidia Tesla was the name of Nvidia's line of ... https://en.wikipedia.org/wiki/Nvidia_Tesla https://en.wikipedia.org/wiki/Wikipedia:Text_o... Video card [ProductModel, Thing] Nvidia Tesla kg:/g/1pz2tx6nq NaN
4 EntitySearchResult 645.830872 The tesla is a derived unit of the magnetic in... https://en.wikipedia.org/wiki/Tesla_(unit) https://en.wikipedia.org/wiki/Wikipedia:Text_o... Unit profile [Thing] Tesla kg:/m/03rhvb NaN

Matt Clarke, Saturday, March 13, 2021

Matt Clarke Matt is a Digital Director who uses data science to help in his work. He has a Master's degree in Internet Retailing (plus two other Master's degrees in different fields) and specialises in the technical side of ecommerce and marketing.

Web Scraping in Python

Learn to retrieve and parse information from the internet using the Python library scrapy.

Start course for FREE

Comments