Web scraping

Articles tagged Web scraping

How to run time-based SEO tests using Python

One of the problems with search engine optimisation or SEO is that search engine algorithms are essentially black boxes. They analyse so many on-page and off-page factors, and use multiple...

How to identify SEO keyword opportunities with Python

One of the most useful Python SEO projects you can undertake is to identify the top keywords for which each of your site’s pages are ranking for. Sometimes, these keywords...

How to read an RSS feed in Python

RSS feeds have been a mainstay on the web for over 20 years now. These XML-based documents are generated by web servers and designed to be read in RSS feed...

19 Python SEO projects that will improve your site

Although I have never really considered myself a technical SEO, I do need to do quite a bit of SEO work as part of my role as an Ecommerce Director....

How to identify internal and external links using Python

Internal linking helps improve the user experience by recommending related content to users, which both reduces bounce rate, and helps search engine optimisation efforts. While there are no hard and...

How to scrape Google results in three lines of Python code

EcommerceTools makes it really quick and easy to scrape Google search engine results in Python. In this simple project, we’ll use EcommerceTools to search Google for your chosen keywords, scrape...

How to create a product and price metadata scraper

In ecommerce, price monitoring is a really important consideration. If you offer your products at a price which is too high within the market, you may lose sales to rivals,...

How to scrape schema.org metadata using Python

As I’ve mentioned in previous posts on web scraping, the most efficient way to scrape data is to identify what Schema.org metadata is in use and then create a microdata...

How to scrape People Also Ask data using Python

People Also Ask or PAA boxes have been becoming increasingly common in Google’s search results over the past few years. They show a range of questions and answers related to...

How to scrape Google search results using Python

Although I suspect you are probably not technically allowed to do it, I doubt there’s an SEO in the land who hasn’t scraped Google search engine results to analyse them,...

How to identify SEO keywords using Google Autocomplete

The Google Autocomplete feature, or Google Suggest as it was previously known, has become a part of everyday life for us all. Start typing a search term into Google, and...

How to create a UK data science jobs dataset

According to the Harvard Business Review, the role of data scientist is said to be “the sexiest job of the 21st century” . Data science and data engineering skills are...

How to count indexed pages using Python

One quick and easy way to understand the size of a website, and its growth rate, is to examine the number of its web pages Google has indexed. You can...

How to access the Google Knowledge Graph Search API

The Google Knowledge Graph database includes an astronomical amount of data on almost every topic you can think of, allowing Google to create Knowledge Panels and infoboxes that summarise search...

How to use Extruct to identify Schema.org metadata usage

The downside to building datasets using web scraping is that every site has custom HTML. If you scrape sites in this way, you’ll forever be building bespoke scrapers, and they’ll...

How to scrape Open Graph protocol data using Python

Many websites include Open Graph protocol data in their document head. This structured data allows social networks, such as Facebook and Twitter, to access specific elements of the page’s content...

How to scrape and parse a robots.txt file using Python

When scraping websites, and when checking how well a site is configured for crawling, it pays to carefully check and parse the site’s robots.txt file. This file, which should be...

How to scrape a site's page titles and meta descriptions

Scraping the titles and meta descriptions from every page on a site can tell you a great deal about its content, the underlying content strategy, or product ranges, and many...

How to scan a site for 404 errors and 301 redirect chains

Both 404 page not found errors and 301 redirect chains can be costly and damaging to the performance of a website. They’re both easy to introduce, especially on ecommerce sites...

How to parse XML sitemaps using Python

XML sitemaps are designed to make life easier for search engines by providing an index of a site’s URLs. However, they’re also a useful tool in competitor analysis and allow...

How to parse URL structures using Python

URLs often contain useful information that can be used to analyse a website, a user’s search, or the breakdown of content present in each section. While they often look pretty...

How to build a web scraper using Requests-HTML

Unless you’re building a large and complex web scraper using Scrapy or Selenium, it’s probable that you’ll utilise Requests and Beautiful Soup. These two packages are brilliant for web scraping....

How to create a Python web scraper using Beautiful Soup

Web scraping is a really useful skill in data science. We obviously need data for our models and analyses, but it’s not always easily available, so building our own datasets...

How to create image datasets for machine learning models

While many models are now pre-trained to identify certain objects, in most cases you will need to undertake further training. This requires the construction of image classification datasets containing a...

How to use NLP to identify what drives customer satisfaction

While some people might naively interpret it as negativity, I think one of the best ways you can improve an ecommerce business is to focus on the stuff you’re not...

How to scrape JSON-LD competitor reviews using Extruct

In the ecommerce sector, you can learn a lot about your competitors and the expectations of your customers by analysing the reviews their customers leave for products and service on...

How to scrape competitor technology data in Python

In ecommerce, it pays to watch what your competitors are doing, so over the past decade or so in which I’ve managed ecommerce businesses, I’ve regularly undertaken competitor analyses. They’re...