How to create synthetic data sets for machine learning

While there are many open source datasets available for you to use when learning new data science techniques, sometimes you may struggle to find a data set to use to...

How to create image datasets for machine learning models

While many models are now pre-trained to identify certain objects, in most cases you will need to undertake further training. This requires the construction of image classification datasets containing a...

How to create an ABC inventory classification model

ABC inventory classification has been one of the most widely used methods of stock control in operations management for decades. It’s an intentionally simple system in which products are assigned...

How to connect to MySQL via an SSH tunnel in Python

Many MySQL databases are configured to accept connections from other servers on the local network and will reject connections from remote machines. Ordinarily, you could work around this by creating...

How to calculate relative dates for Google Analytics queries

The Google Analytics add-on for Google Sheets allows you to use the Google Analytics reporting API to create custom weekly reports and schedule them to run. However, to run a...

How to bin or bucket customer data using Pandas

Data binning, bucketing, or discrete binning, is a very useful technique for both preprocessing and understanding or visualising complex data, especially during the customer segmentation process. It’s applied to continuous...

How to annotate training data for NLP models using Doccano

Whether you’re performing product attribute extraction, named entity recognition, product matching, product categorisation, review sentiment analysis, or you are sorting and prioritising customer support tickets, NLP models can be extremely...

Ecommerce and marketing data sets for machine learning

If you read research papers on machine learning, you’ll notice that many researchers use the same standard datasets so other data scientists can reproduce their work or try and improve...

How to use the BG/NBD model to predict customer purchases

You might think human behaviour would be hard to predict but, in ecommerce data science, it’s not actually as difficult as you may think to predict whether a customer will...