How to use EcommerceTools for technical SEO

Picture by Stephen Phillips, Unsplash.

12 minutes to read

There’s often a lot of faffing around required to get marketing and ecommerce data from various systems into Pandas so you can analyse it, or use it within more complex models. I built the EcommerceTools Python package to take the hassle out of this process and make it quick and easy to fetch and analyse data.

At the moment, it only does the basics, but it’s still useful for my daily ecommerce and marketing work. The aim is to eventually create a package that includes all the tools I need to analyse ecommerce, marketing, and SEO data and create models. In this article I’ll explain how you can use it to analyse technical SEO data.

Install the packages

To get started, open a Jupyter notebook and install the EcommerceTools Python package from PyPi by entering !pip3 install ecommercetools in a code cell and then executing it. Then import pandas and the seo module from ecommercetools.

from ecommercetools import seo
import pandas as pd

1. Discover XML sitemap locations

First, we’ll take a look at the XML sitemaps features. The get_sitemaps() function takes the location of a robots. txt file (always stored at the root of a domain), and returns the URLs of any XML sitemaps listed within. This returns a Python list containing the URL of each sitemap.

from ecommercetools import seo

sitemaps = seo.get_sitemaps("http://www.bbc.co.uk/robots.txt")
sitemaps

['http://www.bbc.co.uk/sitemaps/index-uk-archive.xml',
 'http://www.bbc.co.uk/sitemaps/index-uk-news.xml',
 'http://www.bbc.co.uk/video_sitemap.xml',
 'http://www.bbc.co.uk/sitemap.xml',
 'https://www.bbc.co.uk/food/sitemap.xml',
 'http://www.bbc.co.uk/sitemap.xml',
 'http://www.bbc.co.uk/mobile_sitemap.xml',
 'http://www.bbc.co.uk/sitemap.xml',
 'https://www.bbc.co.uk/ideas/sitemap.xml']

2. Read an XML sitemap into Pandas

The get_dataframe() function allows you to download the URLs in an XML sitemap to a Pandas dataframe. If the sitemap contains child sitemaps, each of these will be retrieved. You can save the Pandas dataframe to CSV in the usual way.

from ecommercetools import seo

df = seo.get_sitemap("http://flyandlure.org/sitemap.xml")
print(df.head())

	loc	changefreq	priority	domain	sitemap_name
0	http://flyandlure.org/	hourly	1.0	flyandlure.org	http://www.flyandlure.org/sitemap.xml
1	http://flyandlure.org/about	monthly	1.0	flyandlure.org	http://www.flyandlure.org/sitemap.xml
2	http://flyandlure.org/terms	monthly	1.0	flyandlure.org	http://www.flyandlure.org/sitemap.xml
3	http://flyandlure.org/privacy	monthly	1.0	flyandlure.org	http://www.flyandlure.org/sitemap.xml
4	http://flyandlure.org/copyright	monthly	1.0	flyandlure.org	http://www.flyandlure.org/sitemap.xml

3. Get Core Web Vitals from PageSpeed Insights

You can also obtain site performance data. The get_core_web_vitals() function retrieves the Core Web Vitals metrics for a list of sites from the Google PageSpeed Insights API and returns results in a Pandas dataframe. The function requires a Google PageSpeed Insights API key.

from ecommercetools import seo

pagespeed_insights_key = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
urls = ['https://www.bbc.co.uk', 'https://www.bbc.co.uk/iplayer']
df = seo.get_core_web_vitals(pagespeed_insights_key, urls)
print(df.head())

	final_url	fetch_time	form_factor	overall_score	speed_index	first_meaningful_paint	first_contentful_paint	time_to_interactive	total_blocking_time	cumulative_layout_shift
0	https://practicaldatascience.co.uk/	2021-03-27T10:56:26.497Z	mobile	74.0	79.0	57.0	61.0	90.0	100	100
3	https://practicaldatascience.co.uk/	2021-03-27T10:57:03.226Z	desktop	95.0	97.0	86.0	87.0	100.0	100	100
1	https://practicaldatascience.co.uk/about	2021-03-27T10:56:37.058Z	mobile	69.0	85.0	61.0	61.0	82.0	100	62
4	https://practicaldatascience.co.uk/about	2021-03-27T10:57:16.035Z	desktop	94.0	96.0	86.0	87.0	100.0	100	67
2	https://practicaldatascience.co.uk/machine-lea...	2021-03-27T10:56:48.098Z	mobile	33.0	52.0	57.0	61.0	19.0	32	82

4. Get Google Knowledge Graph data

The get_knowledge_graph() function returns the Google Knowledge Graph data for a given search term. This requires the use of a Google Knowledge Graph API key. By default, the function returns output in a Pandas dataframe, but you can pass the output="json" argument if you wish to receive the JSON data back.

from ecommercetools import seo

knowledge_graph_key = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
knowledge_graph = seo.get_knowledge_graph(knowledge_graph_key, "tesla", output="dataframe")
print(knowledge_graph)

	resultScore	@type	result.name	result.@id	result.detailedDescription.articleBody	result.detailedDescription.url	result.detailedDescription.license	result.description	result.@type
0	15315.625977	EntitySearchResult	Python	kg:/m/05z1_	Python is an interpreted, high-level and gener...	https://en.wikipedia.org/wiki/Python_(programm...	https://en.wikipedia.org/wiki/Wikipedia:Text_o...	High-level programming language	[Thing, Brand]
1	1671.793579	EntitySearchResult	Python family	kg:/m/05tb5	The Pythonidae, commonly known as pythons, are...	https://en.wikipedia.org/wiki/Pythonidae	https://en.wikipedia.org/wiki/Wikipedia:Text_o...	Snake	[Thing]
2	1301.166748	EntitySearchResult	Pythons	kg:/m/0cv6_m	Python is a genus of constricting snakes in th...	https://en.wikipedia.org/wiki/Python_(genus)	https://en.wikipedia.org/wiki/Wikipedia:Text_o...	Snake	[Thing]
3	497.687103	EntitySearchResult	CPython	kg:/m/06bxxb	CPython is the reference implementation of the...	https://en.wikipedia.org/wiki/CPython	https://en.wikipedia.org/wiki/Wikipedia:Text_o...	NaN	[Thing, SoftwareApplication]
4	378.672913	EntitySearchResult	Python	kg:/m/0l8ry	In Greek mythology, Python was the serpent, so...	https://en.wikipedia.org/wiki/Python_(mythology)	https://en.wikipedia.org/wiki/Wikipedia:Text_o...	NaN	[Thing]
5	312.430939	EntitySearchResult	Reticulated python	kg:/m/0m5qz	The reticulated python is a python species nat...	https://en.wikipedia.org/wiki/Reticulated_python	https://en.wikipedia.org/wiki/Wikipedia:Text_o...	Snake	[Thing]
6	283.799957	EntitySearchResult	Python	kg:/m/02rg562	Python is a double-loop corkscrew roller coast...	https://en.wikipedia.org/wiki/Python_(Efteling)	https://en.wikipedia.org/wiki/Wikipedia:Text_o...	Roller coaster in Kaatsheuvel, Netherlands	[Thing, TouristAttraction]
7	203.535995	EntitySearchResult	Requests	kg:/m/012hn1l3	Requests is a Python HTTP library, released un...	https://en.wikipedia.org/wiki/Requests_(software)	https://en.wikipedia.org/wiki/Wikipedia:Text_o...	NaN	[Thing, SoftwareApplication]
8	171.786148	EntitySearchResult	Python	kg:/m/01v25c	The Rafael Python is a family of air-to-air mi...	https://en.wikipedia.org/wiki/Python_(missile)	https://en.wikipedia.org/wiki/Wikipedia:Text_o...	NaN	[Thing]
9	160.946594	EntitySearchResult	Python Imaging Library	kg:/m/06rx86	Python Imaging Library is a free and open-sour...	https://en.wikipedia.org/wiki/Python_Imaging_L...	https://en.wikipedia.org/wiki/Wikipedia:Text_o...	NaN	[Thing, SoftwareApplication]

5. Get Google Search Console API data

The query_google_search_console() function runs a search query on the Google Search Console API and returns data in a Pandas dataframe. This function requires a JSON client secrets key with access to the Google Search Console API.

from ecommercetools import seo

key = "google-search-console.json"
site_url = "http://flyandlure.org"
payload = {
    'startDate': "2019-01-01",
    'endDate': "2019-12-31",
    'dimensions': ["page", "device", "query"],
    'rowLimit': 100,
    'startRow': 0
}

df = seo.query_google_search_console(key, site_url, payload)
print(df.head())

	page	device	query	clicks	impressions	ctr	position
0	http://flyandlure.org/articles/fly_fishing_gea...	MOBILE	simms freestone waders review	56	217	25.81	3.12
1	http://flyandlure.org/	MOBILE	fly and lure	37	159	23.27	3.81
2	http://flyandlure.org/articles/fly_fishing_gea...	DESKTOP	orvis encounter waders review	35	134	26.12	4.04
3	http://flyandlure.org/articles/fly_fishing_gea...	DESKTOP	simms freestone waders review	35	200	17.50	3.50
4	http://flyandlure.org/	DESKTOP	fly and lure	32	170	18.82	3.09

6. Get the number of “indexed” pages

The get_indexed_pages() function uses the “site:” prefix to search Google for the number of pages “indexed”. This is very approximate and may not be a perfect representation, but it’s usually a good guide of site “size” in the absence of other data.

from ecommercetools import seo

urls = ['https://www.bbc.co.uk', 'https://www.bbc.co.uk/iplayer', 'http://flyandlure.org']
df = seo.get_indexed_pages(urls)
print(df.head())

	url	indexed_pages
2	http://flyandlure.org	2090
1	https://www.bbc.co.uk/iplayer	215000
0	https://www.bbc.co.uk	12700000

7. Scrape keyword suggestions from Google Autocomplete

The google_autocomplete() function returns a set of keyword suggestions from Google Autocomplete. The include_expanded=True argument allows you to expand the number of suggestions shown by appending prefixes and suffixes to the search terms.

from ecommercetools import seo

suggestions = seo.google_autocomplete("data science", include_expanded=False)
print(suggestions)

suggestions = seo.google_autocomplete("data science", include_expanded=True)
print(suggestions)

	term	relevance
0	data science jobs	650
1	data science jobs chester	601
2	data science course	600
3	data science masters	554
4	data science salary	553
5	data science internship	552
6	data science jobs london	551
7	data science graduate scheme	550

8. Retrieve robots.txt content

The get_robots() function returns the contents of a robots.txt file in a Pandas dataframe so it can be parsed and analysed.

from ecommercetools import seo

robots = seo.get_robots("http://www.flyandlure.org/robots.txt")
print(robots)

	directive	parameter
0	User-agent	*
1	Disallow	/signin
2	Disallow	/signup
3	Disallow	/users
4	Disallow	/contact
5	Disallow	/activate
6	Disallow	/*/page
7	Disallow	/articles/search
8	Disallow	/search.php
9	Disallow	q=
10	Disallow	category_slug=
11	Disallow	country_slug=
12	Disallow	county_slug=
13	Disallow	features=

9. Scrape Google search engine results

The get_serps() function is one of the quickest and easiest ways to scrape Google search results using Python. This simple function takes a keyword phrase and returns a Pandas dataframe containing the Google search engine results for a given search term.

This is only designed for infrequent use and doesn’t include any features to prevent it from being blocked. If you want to perform large-scale web scraping of Google SERPs then you’ll need a much more sophisticated solution.

from ecommercetools import seo

serps = seo.get_serps("data science blog")
print(serps)

	title	link	text
0	10 of the best data science blogs to follow - ...	https://www.tableau.com/learn/articles/data-sc...	10 of the best data science blogs to follow. T...
1	Best Data Science Blogs to Follow in 2020 \| by...	https://towardsdatascience.com/best-data-scien...	14 Jul 2020 — 1. Towards Data Science · Joined...
2	Top 20 Data Science Blogs And Websites For Dat...	https://medium.com/@exastax/top-20-data-scienc...	Top 20 Data Science Blogs And Websites For Dat...
3	Data Science Blog – Dataquest	https://www.dataquest.io/blog/	Browse our data science blog to get helpful ti...
4	51 Awesome Data Science Blogs You Need To Chec...	https://365datascience.com/trending/51-data-sc...	Blog name: DataKind · datakind data science bl...
5	Blogs on AI, Analytics, Data Science, Machine ...	https://www.kdnuggets.com/websites/blogs.html	Individual/small group blogs · Ai4 blog, featu...
6	Data Science Blog – Applied Data Science	https://data-science-blog.com/	... an Bedeutung – DevOps for Data Science. De...
7	Top 10 Data Science and AI Blogs in 2020 - Liv...	https://livecodestream.dev/post/top-data-scien...	Some of the best data science and AI blogs for...
8	Data Science Blogs: 17 Must-Read Blogs for Dat...	https://www.thinkful.com/blog/data-science-blogs/	Data scientists could be considered the magici...
9	rushter/data-science-blogs: A curated list of ...	https://github.com/rushter/data-science-blogs	A curated list of data science blogs. Contribu...

Matt Clarke, Saturday, March 20, 2021

Matt Clarke Matt is an Ecommerce and Marketing Director who uses data science to help in his work. Matt has a Master's degree in Internet Retailing (plus two other Master's degrees in different fields) and specialises in the technical side of ecommerce and marketing.