Major internet retailers, like Walmart and Amazon, have been at the forefront of ecommerce data science and data analytics for many years, contributing lots of interesting papers to data science research journals. However, recent advances in data science technology and the growth of data skills in the ecommerce field means that machine learning and AI are now gaining traction at smaller online retail businesses.
Machine learning algorithms and AI can be applied to almost any area of internet retailing, from the site technology and user experience, to marketing and operations management. However, if you’re new to the field, it can be tricky to see how data science and machine learning can be used to help improve your business performance. Here are some ideas for ecommerce data science projects and use cases to inspire your next projects.
Sentiment analysis is a Natural Language Processing data analysis technique that allows you to classify text content according to whether it’s positive, negative, or neutral. Ecommerce businesses can use sentiment analysis for many things, from analysing social media posts, incoming emails, or customer messages to flag up the ones which are negative for a speedier response, to examining the sentiment of your product reviews or those of your rivals.
The classic way data scientists at online retailers perform sentiment analysis is to construct a supervised learning model, or a recurrent neural network using something like Long Short-Term Memory via Keras and TensorFlow. However, the advent of the Transformer model architecture, and the Hugging Face group, means that there are now extremely powerful pre-trained models available to use and fine tune.
Sentiment analysis data helps businesses extract insights from their data and better understand customers and their likes and dislikes, which can aid the strategic decision making process and provide evidence stakeholders may want before making bigger changes to the online retail business.
Customer segmentation is one of the most common uses for data science within ecommerce companies and has a wide range of practical applications. It is generally used for improving the understanding of customers and how they behave during the shopping process, and for improving the targeting of marketing activities via more complex predictive models.
Customers can be segmented in many ways in order to better understand them or solve marketing goals. This can be based on their shopping behaviour and online transactions, their demographics, and numerous other things.
For a very long time, the classic RFM model, which looks at the Recency, Frequency, and Monetary value of customers has remained one of the most widely used, and most effective customer segmentation models in ecommerce businesses. It’s also very simple to implement, with no need for deep learning or fancy machine learning algorithms.
Customer churn is the loss of customers due to attrition and is the opposite of customer retention. Customers are very expensive to acquire and replace, so it pays to know when they’re going to churn so you can step in and prevent it before it happens. It is also essential for accurately determining how many of your customers are still customers.
Calculating churn is easier in contractual businesses, such as phone or broadband providers, since it is easy to see when they are about to churn as their contracts near the end date. However, predicting churn in non-contractual markets like ecommerce is much harder because the time of a customer’s “death” (in Customer Lifetime Value terminology) is not known, and instead needs to be predicted via complex machine learning algorithms.
Inventory classification is an operations management approach commonly used in ecommerce to help procurement managers reduce the percentage of stock-outs on the lines which generate the most revenue for the business. ABC analysis is the most commonly used method. This uses the Pareto principle or 80/20 rule to allocate products into three (or more) classes based on their cumulative contribution to overall revenue.
XYZ analysis is a similar system, usually used alongside ABC inventory classification, but designed to measure the predictability of product sales. Some products have little standard deviation in their sales volumes, while others might not sell for ages and then suddenly see sudden spikes in demand. Using ABC and XYZ classification is one of the best way to help procurement staff stay on top of thousands of SKUs.
Price is a major consideration for many customers so the most savvy retailers monitor and benchmark their prices against those of their rivals and make adjustments when required. Python-based scraping tools make this process easier, but scraping applications can quickly grow into complex projects.
One way to simplify the web scraping process in ecommerce is to utilise schema.org markup present in most quality web pages. Tools such as Extruct allow you to extract content added by the vendor to aid search engines for your own competitor insights, which massively decreases technical debt.
If you’re scraping prices from competitors, or if you’re running a product aggregator like PriceRunner or Google Shopping, then you’re going to need to get to grips with product matching. As the name suggests, product matching aims to find exact matches between products sold by different online retailers.
This allows online businesses to have confidence that the prices are being matched on a like-for-like basis, and ensures aggregators group together products of the same type. Product matching isn’t the easiest thing to tackle, but can be framed as a supervised machine learning model using machine learning algorithms such as extreme gradient boosting and related techniques.
Data science techniques, including web scraping and NLP, can be very useful in speeding up the competitor analysis process many ecommerce businesses undertake annually or bi-annually as part of their data analytics strategy.
I’ve applied these to scraping data on the performance of social media content, competitor websites, the underlying technologies they are using, and the prices of their products, as well as the reviews customers have posted on review platforms such as TrustPilot and Feefo.
Data analysis from these techniques can help retail businesses understand the issues that annoy potentially loyal customers and help them improve the customer experience.
Market Basket Analysis is one of the oldest techniques still used in ecommerce data science. It aims to find associations between products to identify things commonly purchased together, and it is commonly used for helping trading managers or marketers determine the right products to promote. However, ecommerce stores can also be used for category management, loyalty programmes, and many other applications.
There are several ways to perform Market Basket Analysis or MBA, but the most common approach is to use the Apriori algorithm. This identifies frequent itemsets, or groups of products that appear together within baskets. From these data, association rules can be calculated which can tell the retailer how strong the relationships are, allowing them to create product recommendations or offers to help boost sales.
When you run an ecommerce site with thousands of product pages, keeping track of them all can be challenging. Anomaly detection can be useful in helping you determine when something has changed, perhaps due to an administrative error, an SEO issue, or a pricing or stock problem.
The Anomaly Detection Toolkit (ADTK) is excellent for building custom anomaly detection models for ecommerce data. It features a wide range of models to detect different kinds of anomaly, and lets you extract the raw data and plot the anomalies easily. It can be easily applied to sales data, Google Analytics data, or even Google Search Console SEO data.
Forecasting isn’t that easy, especially in markets where a store’s sales are influenced by uncontrollable outside factors, like the weather or pandemics. The Auto-regressive Moving Average model (ARIMA) and its derivatives, such as SARIMA, and the more complex neural network Long Short Term Memory (LSTM) model, are all popular choices.
However, the Prophet and NeuralProphet forecasting models, which were developed by Facebook, are also worth considering. They are faster at model fitting and includes several features that make it ideally suited to ecommerce, including the ability to be able to introduce custom holiday and trading calendars and the option to pass in additional regressors with information on outside factors that might influence sales or traffic away from the expected trend.
In recent years, there’s been a huge movement in the SEO community away from Microsoft Excel and towards Python for SEO. There are a number of innovative “SEO Pythonistas” creating some really interesting analyses and tools that use Python to aid their technical SEO work, many of which are applicable to ecommerce websites.
As these are among the most useful for my day-to-day work, I’ve spent many weekends building little tools that I can slot into automations I can run at work, to save me time and let me focus on more interesting projects, instead of spending hours doing mundane, repetitive, data collection and analysis.
There are many great ways you can use your data science skills to aid your customer service team. Most of these relate to identifying why customers are making contact and then working with them to reduce the need for customers to raise tickets, and instead self-serve or convert online.
Analysing the customer experience helps businesses identify the problems customers are encountering when using the company website, which may either prevent them making purchases, lead to negative customer reviews, cause churn, or place the customer service team under pressure.
Common approaches used to improve things for customer services teams via data analytics and data science include automatically classifying support tickets or customer service emails, using NLP to identify what is causing customers to make contact or leave negative reviews, and churn and survival analysis methods that examine the factors correlated with unhappy customers.
As a former magazine journalist, it worries me a bit that AI may eventually replace some copywriters. However, as an ecommerce director, I also love the fact that Natural Language Processing (NLP), Natural Language Understanding (NLU), and Natural Language Generation (NLG), are now so powerful that data scientists can use them to automate boring and time consuming manual tasks.
I’ve already applied these Natural Language Processing techniques to checking the content for potential errors, using Natural Language Understanding to identify whether the content answers the questions customers will be seeking answer for prior to purchase, and even machine-generating product summaries from scratch via deep learning.
In a retail business, forecasting is one thing, but there are also models that allow you to predict which individual customers are going to order and when, and even what they’ll buy, and how much they’ll spend. These have many great applications, particularly in targeted email marketing and personalisation.
One of the classic techniques is the calculation of Customer Lifetime Value (CLV), which analyses each customer’s past pattern of transactions to predict what they will do in the future. While the CLV phrase is bandied around most ecommerce offices, CLV is actually harder to do properly than most data scientists probably envisage.
The algorithm used helps retailers predict what each of their customers is going to spend over the coming period, which aids the decision making process in setting the acceptable Customer Acquisition Cost (CAC) to ensure maximum profitability.
Finally, there are a whole load of ways you can analyse your category and product data to help your category manager or procurement teams. From understanding the products that are commonly purchased in bulk, or repurchased repeatedly, to calculating the various operations and procurement metrics you need to help ensure you don’t run out of stock.
Matt Clarke, Thursday, April 29, 2021