How to bin or bucket customer data using Pandas

Data binning, bucketing, or discrete binning, is a very useful technique for both preprocessing and understanding or visualising complex data, especially during the customer segmentation process. It’s applied to continuous...

How to annotate training data for NLP models using Doccano

Whether you’re performing product attribute extraction, named entity recognition, product matching, product categorisation, review sentiment analysis, or you are sorting and prioritising customer support tickets, NLP models can be extremely...

Ecommerce and marketing data sets for machine learning

If you read research papers on machine learning, you’ll notice that many researchers use the same standard datasets so other data scientists can reproduce their work or try and improve...

How to use the BG/NBD model to predict customer purchases

You might think human behaviour would be hard to predict but, in ecommerce data science, it’s not actually as difficult as you may think to predict whether a customer will...

How to use NLP to identify what drives customer satisfaction

While some people might naively interpret it as negativity, I think one of the best ways you can improve an ecommerce business is to focus on the stuff you’re not...

How to create a BI platform using Apache Superset

Apache Superset is a new “enterprise-ready” web application for building business intelligence (BI) applications and dashboards. Developed by the team that built Airbnb using the Flask Python framework, React JS,...

How to use Apache Druid for real-time analytics data storage

Apache Druid is described as a high performance real time analytics database and was developed at Metamarkets in 2011 for their internal analytics system. Unlike traditional relational databases, such as...

How to set up a Docker container for your MySQL server

Like most people who work in ecommerce data science, I regularly need to access data stored in a database - usually MySQL or MariaDB, but sometimes also MSSQL. Although it...

How to use Category Encoders to encode categorical variables

Most datasets you’ll encounter will probably contain categorical variables. They are often highly informative, but the downside is that they’re based on object or datetime data types such as text...