Practical Data Science

How to visualise internal linking in Python using NetworkX graphs

How to visualise internal linking in Python using NetworkX graphs

Adding internal links to articles helps reduce bounce rate by promoting related content site visitors may find interesting, but it also has a powerful impact upon search engine optimisation or...

How to perform tokenization in NLP with NLTK and Python

How to perform tokenization in NLP with NLTK and Python

Tokenization is a data science technique that breaks up the words in a sentence into a comma separated list of distinct words or values. It’s a crucial first step in...

How to calculate the ecommerce KPIs you need to hit your revenue target

How to calculate the ecommerce KPIs you need to hit your revenue target

In ecommerce, you’ll typically be given a revenue target your site needs to hit every month. In my experience, these revenue targets are often proposed by finance directors, CEOs, or...

How to create a Naive Bayes text classification model using scikit-learn

How to create a Naive Bayes text classification model using scikit-learn

Naive Bayes classifiers are commonly used for machine learning text classification problems, such as predicting the sentiment of a tweet, identifying the language of a piece of text, or categorising...

How to use cross validation in scikit-learn machine learning models

How to use cross validation in scikit-learn machine learning models

When training a machine learning model you will split your dataset in two, with one portion of the data used to train the model, and the other portion (usually 20-30%)...

How to create a random forest classification model using scikit-learn

How to create a random forest classification model using scikit-learn

The random forest model or random decision forest model is a supervised machine learning algorithm that can be used for classification or regression problems. It’s what’s known as an ensemble...

How to create a decision tree classification model using scikit-learn

How to create a decision tree classification model using scikit-learn

The Decision Tree or DT is one of the most well known and most widely used supervised machine learning algorithms and can be applied to both regression and classification. As...

How to dedupe lists in Python with set() and intersection()

How to dedupe lists in Python with set() and intersection()

When working with Python lists you’ll often encounter times when you need to remove duplicate values present in a single list, remove duplicates found in multiple lists, or identify the...

How to use DATEDIFF() to calculate date differences in MySQL

How to use DATEDIFF() to calculate date differences in MySQL

When working with customer data or upon time series data science projects, you’ll often find the need to calculate the difference between two dates in your MySQL queries. The MySQL...