When scraping websites, and when checking how well a site is configured for crawling, it pays to carefully check and parse the site’s robots.txt file. This file, which should be...
Scraping the titles and meta descriptions from every page on a site can tell you a great deal about its content, the underlying content strategy, or product ranges, and many...
Both 404 page not found errors and 301 redirect chains can be costly and damaging to the performance of a website. They’re both easy to introduce, especially on ecommerce sites...
Large, uncompressed images slow down your site, increase bandwidth costs, harm the user experience, and impact search engine rankings. In this project, I’ll show you how you can bulk resize...
There’s often a lot of repetition in many data science projects. In tasks that utilise Natural Language Processing (or NLP), for example, you’ll always need to preprocess your text to...
XML sitemaps are designed to make life easier for search engines by providing an index of a site’s URLs. However, they’re also a useful tool in competitor analysis and allow...
URLs often contain useful information that can be used to analyse a website, a user’s search, or the breakdown of content present in each section. While they often look pretty...
Keyword cannibalisation occurs when you have several pages ranking for the same phrase, effectively putting them in competition with each other for search engine rankings. Since Google generally only shows...
In many data science projects you may need to download remote data, such as images, CSV files, or compressed data. Python makes it fairly straightforward to download files within your...