Catalogue marketing is dying out. Over the past few years, virtually all the UK’s top catalogue retailers have stopped printing on paper and successfully transitioned their businesses online, either to...
When using the k means clustering algorithm, you need to specifically define k, or the number of clusters you want the algorithm to create. Rather than selecting an arbitrary value,...
The downside to building datasets using web scraping is that every site has custom HTML. If you scrape sites in this way, you’ll forever be building bespoke scrapers, and they’ll...
Most very large datasets tend to get compressed on servers to preserve storage space and bandwidth and allow them to be downloaded more quickly by end users. Python includes some...
If you regularly work with ecommerce data, you’re likely to have encountered PHP serialized arrays or objects. Serialization is a process used to take a complex data structure, such as...
The Google Analytics Measurement Protocol API lets you add data to your GA account that hasn’t been triggered by a user visiting a web page. Since it’s so flexible, you...
Many websites include Open Graph protocol data in their document head. This structured data allows social networks, such as Facebook and Twitter, to access specific elements of the page’s content...
When scraping websites, and when checking how well a site is configured for crawling, it pays to carefully check and parse the site’s robots.txt file. This file, which should be...
Scraping the titles and meta descriptions from every page on a site can tell you a great deal about its content, the underlying content strategy, or product ranges, and many...