Web scraping

34 articles tagged Web scraping

How to create a Shopify price tracker with Python

In ecommerce, it’s very common for retailers to need to monitor the prices of their competitors. Prices make a big difference to sales and if they’re set too high then...

How to scrape a Shopify site in Python via products.json

Since many modern websites use JavaScript and JSON to build their pages, you can sometimes find public facing APIs buried in the page code that give you access to structured...

How to use CSS and XPath custom extraction in Advertools

The Advertools web scraping package popular in the Python SEO community automatically extracts a wide range of page elements, such as the title, meta description, and various schema.org and OpenGraph...

How to scrape a website using Advertools

For larger web scraping projects, the Scrapy web scraping Python package is one of the most effective tools. It’s powerful and fast and have a huge range of features. However,...

How to check if URLs are redirected using Requests

The requests HTTP library for Python allows you to make HTTP requests to servers and receive back HTTP status codes, site content, and other data. It’s extremely useful for building...

How to read an XML feed into a Pandas dataframe

XML feeds are a data format that uses Extensible Markup Language to provide structured data that can be read by search engines and online advertising providers. For example, a Google...

16 Python web scraping projects for ecommerce and SEO

Web scraping is a programming technique that uses a script or bot to visit one or more websites and extract specific elements or HTML tags from the source code of...

How to run time-based SEO tests using Python

One of the problems with search engine optimisation or SEO is that search engine algorithms are essentially black boxes. They analyse so many on-page and off-page factors, and use multiple...

How to identify SEO keyword opportunities with Python

One of the most useful Python SEO projects you can undertake is to identify the top keywords for which each of your site’s pages are ranking for. Sometimes, these keywords...

How to read an RSS feed in Python

RSS feeds have been a mainstay on the web for over 20 years now. These XML-based documents are generated by web servers and designed to be read in RSS feed...

19 Python SEO projects that will improve your site

Although I have never really considered myself a technical SEO, I do need to do quite a bit of SEO work as part of my role as an Ecommerce Director....

How to identify internal and external links using Python

Internal linking helps improve the user experience by recommending related content to users, which both reduces bounce rate, and helps search engine optimisation efforts. While there are no hard and...

How to scrape Google results in three lines of Python code

EcommerceTools makes it really quick and easy to scrape Google search engine results in Python. In this simple project, we’ll use EcommerceTools to search Google for your chosen keywords, use...

How to create a product and price metadata scraper

In ecommerce, price monitoring is a really important consideration. If you offer your products at a price which is too high within the market, you may lose sales to rivals,...

How to scrape schema.org metadata using Python

As I’ve mentioned in previous posts on web scraping, the most efficient way to scrape data is to identify what Schema.org metadata is in use and then create a microdata...

How to scrape People Also Ask data using Python

People Also Ask or PAA boxes have been becoming increasingly common in Google’s search results over the past few years. They show a range of questions and answers related to...

How to scrape Google search results using Python

Although I suspect you are probably not technically allowed to do it, I doubt there’s an SEO in the land who hasn’t scraped Google search engine results to analyse them,...

How to identify SEO keywords using Google Autocomplete

The Google Autocomplete feature, or Google Suggest as it was previously known, has become a part of everyday life for us all. Start typing a search term into Google, and...

How to create a UK data science jobs dataset

According to the Harvard Business Review, the role of data scientist is said to be “the sexiest job of the 21st century”. Data science and data engineering skills are said...

How to count indexed pages using Python

One quick and easy way to understand the size of a website, and its growth rate, is to examine the number of its web pages Google has indexed. You can...

How to access the Google Knowledge Graph Search API

The Google Knowledge Graph database includes an astronomical amount of data on almost every topic you can think of, allowing Google to create Knowledge Panels and infoboxes that summarise search...

How to use Extruct to identify Schema.org metadata usage

The downside to building datasets using web scraping is that every site has custom HTML. If you scrape sites in this way, you’ll forever be building bespoke scrapers, and they’ll...

How to scrape Open Graph protocol data using Python

Many websites include Open Graph protocol data in their document head. This structured data allows social networks, such as Facebook and Twitter, to access specific elements of the page’s content...

How to scrape and parse a robots.txt file using Python

When scraping websites, and when checking how well a site is configured for crawling, it pays to carefully check and parse the site’s robots.txt file. This file, which should be...

How to scrape a site's page titles and meta descriptions

Scraping the titles and meta descriptions from every page on a site can tell you a great deal about its content, the underlying content strategy, or product ranges, and many...

How to scan a site for 404 errors and 301 redirect chains

Both 404 page not found errors and 301 redirect chains can be costly and damaging to the performance of a website. They’re both easy to introduce, especially on ecommerce sites...

How to parse XML sitemaps using Python

XML sitemaps are designed to make life easier for search engines by providing an index of a site’s URLs. However, they’re also a useful tool in competitor analysis and allow...

How to parse URL structures using Python

URLs often contain useful information that can be used to analyse a website, a user’s search, or the breakdown of content present in each section. While they often look pretty...

How to build a web scraper using Requests-HTML

Unless you’re building a large and complex web scraper using Scrapy or Selenium, it’s probable that you’ll utilise Requests and Beautiful Soup. These two packages are brilliant for web scraping....

How to create a Python web scraper using Beautiful Soup

Web scraping is a really useful skill in data science. We obviously need data for our models and analyses, but it’s not always easily available, so building our own datasets...

How to create image datasets for machine learning models

While many models are now pre-trained to identify certain objects, in most cases you will need to undertake further training. This requires the construction of image classification datasets containing a...

How to use NLP to identify what drives customer satisfaction

While some people might naively interpret it as negativity, I think one of the best ways you can improve an ecommerce business is to focus on the stuff you’re not...

How to scrape JSON-LD competitor reviews using Extruct

In the ecommerce sector, you can learn a lot about your competitors and the expectations of your customers by analysing the reviews their customers leave for products and service on...

How to scrape competitor technology data in Python

In ecommerce, it pays to watch what your competitors are doing, so over the past decade or so in which I’ve managed ecommerce businesses, I’ve regularly undertaken competitor analyses. They’re...