How to scan a site for 404 errors and 301 redirect chains

Both 404 page not found errors and 301 redirect chains can be costly and damaging to the performance of a website. They’re both easy to introduce, especially on ecommerce sites...

How to resize and compress images in Python with the TinyPNG API

Large, uncompressed images slow down your site, increase bandwidth costs, harm the user experience, and impact search engine rankings. In this project, I’ll show you how you can bulk resize...

How to preprocess text for NLP in four easy steps

There’s often a lot of repetition in many data science projects. In tasks that utilise Natural Language Processing (or NLP), for example, you’ll always need to preprocess your text to...

How to parse XML sitemaps using Python

XML sitemaps are designed to make life easier for search engines by providing an index of a site’s URLs. However, they’re also a useful tool in competitor analysis and allow...

How to parse URL structures using Python

URLs often contain useful information that can be used to analyse a website, a user’s search, or the breakdown of content present in each section. While they often look pretty...

How to identify keyword cannibalisation using Python

Keyword cannibalisation occurs when you have several pages ranking for the same phrase, effectively putting them in competition with each other for search engine rankings. Since Google generally only shows...

How to download files with Python

In many data science projects you may need to download remote data, such as images, CSV files, or compressed data. Python makes it fairly straightforward to download files within your...

How to detect sarcasm using machine learning

I love sarcasm, but unfortunately I have a shaky ability to easily detect it in the voices of others, an aptitude for misinterpreting serious comments for sarcasm and then inappropriately...

How to detect fake news with machine learning

Long before Donald Trump erroneously applied it to mean “news that he didn’t agree with”, the term “fake news” referred to disinformation and misleading editorial content. In recent years, it’s...