A quick guide to catalogue marketing data science

Catalogue marketing is dying out. Over the past few years, virtually all the UK’s top catalogue retailers have stopped printing on paper and successfully transitioned their businesses online, either to...

How to use knee point detection in k means clustering

When using the k means clustering algorithm, you need to specifically define k, or the number of clusters you want the algorithm to create. Rather than selecting an arbitrary value,...

How to use Extruct to identify Schema.org metadata usage

The downside to building datasets using web scraping is that every site has custom HTML. If you scrape sites in this way, you’ll forever be building bespoke scrapers, and they’ll...

How to unzip files with Python

Most very large datasets tend to get compressed on servers to preserve storage space and bandwidth and allow them to be downloaded more quickly by end users. Python includes some...

How to unserialize serialized PHP arrays using Python

If you regularly work with ecommerce data, you’re likely to have encountered PHP serialized arrays or objects. Serialization is a process used to take a complex data structure, such as...

How to send data to Google Analytics in Python with PyGAMP

The Google Analytics Measurement Protocol API lets you add data to your GA account that hasn’t been triggered by a user visiting a web page. Since it’s so flexible, you...

How to scrape Open Graph protocol data using Python

Many websites include Open Graph protocol data in their document head. This structured data allows social networks, such as Facebook and Twitter, to access specific elements of the page’s content...

How to scrape and parse a robots.txt file using Python

When scraping websites, and when checking how well a site is configured for crawling, it pays to carefully check and parse the site’s robots.txt file. This file, which should be...

How to scrape a site's page titles and meta descriptions

Scraping the titles and meta descriptions from every page on a site can tell you a great deal about its content, the underlying content strategy, or product ranges, and many...