How to create an ABC inventory classification model

Learn how to create an ABC inventory classification model in Python so your procurement manager to keep better track of the product inventory.

How to create an ABC inventory classification model
Pictures by Nana Smirnova and Guillaume Boldoc, Unsplash.
22 minutes to read

ABC inventory classification has been one of the most widely used methods of stock control in operations management for decades. It’s an intentionally simple system in which products are assigned to Classes A-C based on their importance to the business.

It’s designed to help procurement managers prioritise, monitor, manage, and protect sales of SKUs in inventories that often contain thousands, or even tens of thousands, of SKUs.

The aim is for Class A SKUs to be managed so closely that they never go out of stock, for Class B SKUs to be managed well so they rarely, if ever, go out of stock, while allowing staff to free up their time by giving the least attention to the Class C SKUs that sell in low volumes and don’t require constant management.

Why is inventory classification needed in ecommerce?

Many ecommerce sites offer huge numbers of SKUs or stock-keeping units. They are sold at different speeds, bought in different volumes, and shipped from different suppliers around the world, with varying degrees of reliability.

Procurement managers are tasked with purchasing them at the right price, in the right quantities, and at the right time to reduce the likelihood of the company running out, while still trying to avoid tying up the companies money in too much stock.

While software exists to make this much easier, it can still be a difficult trade-off and it often goes wrong, resulting in either an excess of slow-moving stock to clear, or worse, stock-outs which cause massive issues for the customer experience and greatly impact sales.

How does ABC analysis work?

Instead of trying to manage every SKU at an individual level, ABC inventory classification groups them so staff can prioritise some products over others. It offers a generalised approach and allows staff to quickly identify any important SKUs that may be in danger of going out of stock.

ABC inventory classification is based on the Pareto principle, or 80/20 rule, which says that 80% of the sales will come from 20% of the products. It divides products into three classes - A, B, and C - according to specific criteria, such as their sales volume or “annual dollar usage” (ADU) based on their sales revenue in the past 12 months. The classification of each product is then used to control the product inventory and determine the order policy for each item.

  • Class A: Very important items. Generally few in number.

  • Class B: Moderately important items. General modest in number.

  • Class C: Relatively unimportant items. Generally large in number.

Since the Pareto principle invariably applies to product sales in ecommerce businesses, Class A will typically contain a handful of products that generate most of the sales and need careful control. Class B will contain a modest number of slightly less important products, while Class C contains lots of fairly unimportant SKUs that sell in low volumes. It’s simple but effective.

Why three classes?

Despite the name, ABC classification does not really need to be limited to three classes. Adding additional classes increases granularity in the data can be useful when there are high volumes of products or when the distribution is strongly skewed and Class A includes only a few SKUs. It’s actually quite common for it to be extended to include Classes A to E, but it rarely goes beyond this or the benefits of simplicity are lost.

What criteria can be used?

Almost all ABC models are based on annual dollar usage (ADU), or revenue per SKU over the past 12 months. However, the use of a single metric is one of the most common criticisms of the approach. Demand volume (the number of units sold) is often used too.

Demand value: Demand value (or annual dollar usage) is the most common criterion used. It measures the amount of revenue each SKU generated in the previous 12 months.

Demand volume: Demand volume is the second most common single criterion. Rather than looking at the value of each SKU sold, demand volume looks at the number of units.

Can multiple criteria be used?

Yes, quite a lot of the most recent research in this field focuses on multiple criteria inventory classification (or MCIC), which uses more complex models incorporating more than just the single feature seen in most ABC classification models.

Multicriteria inventory classification models look at everything from the availability of the product from suppliers, and the costs of getting it in stock, to the harm to the customer or the business in not having the product on sale.

For example, if you have several substitutes for a product in stock and customers are happy to switch, a stock-out might not cause a major issue. If it’s the only product of its type and can’t be substituted you’ll get no sales while it’s out of stock.

  • Certainty of supply
  • Rate of obsolescence
  • Lead time
  • Substitutability
  • Cost of replenishment
  • Holding cost
  • Purchase price
  • Shortage cost (or criticality)
  • Average order quantity
  • Repairability
  • Requests per year
  • Durability
  • Stock-out unit penalty cost
  • Component importance

Obviously, multicriteria inventory classifications are much harder to build, not least because some of the data can be time consuming and costly to acquire. The models themselves are also significantly more complex than single parameter models incorporating only ADU.

However, some researchers reckon they may not bring any additional benefits. Possibly because the effects of ADU are so strong. However, there may be business-specific use-cases where multicriteria inventory classification is required to handle specific scenarios.

How are inventory classification models created?

Single parameter ABC inventory classification models are based on rankings, so are straightforward to build and are very effective. The multicriteria inventory classification models are much more complex. A whole load of different techniques have been used for these, covering everything from clustering techniques like k-NN to AI methodologies.

  • Weighted linear programming
  • Analytic hierarchy process (AHP)
  • Operations-related groups (ORG)
  • Multiple discriminant analysis (MDA)
  • Support vector machines (SVM)
  • Backpropagation networks (BPNs)
  • k-nearest neighbour algorithm (k-NN)

Since the data are based on product sales, the classifications do need to be updated regularly. Most businesses using this technique will typically have a data pipeline that pre-calculates the rank and class for each product for the past 12 months on a nightly scheduled task.

Does ABC inventory classification have any drawbacks?

It’s a simple approach that works well, but with its simplicity comes some minor disadvantages. However, as this methodology has been so widely used for decades in operations management, these obviously aren’t a big concern for most businesses, and there are simple workarounds for any issues that may arise.

  • Use of ADU: Using annual dollar usage (ADU) for the means of categorising products results in those that generate more revenue being put under careful control and those that don’t getting overlooked. If your inventory system stores components costing pennies that are crucial to the production of Class A SKUs, then there is a risk that stock-outs in the components could escape the attention of procurement managers.

  • Long-term stock-outs: If you’re in the unfortunate position in having unreliable suppliers with very long lead times, it’s also possible that products may move between classes and drop off the radar. Keeping track of historic ABC class and ranks is one way to manage this.

  • Seasonality: Another factor that can affect ABC classes is seasonality. If some products have a short lifespan and only sell at certain times of the year, stock may need to be managed more carefully to ensure stocks are held to cope with peak demand.

  • Inactive products: By default, ABC analysis only seeks to look at the active SKUs in an inventory. However, in most inventories there are loads of slow-moving SKUs that never sell, either because they’re priced un-competitively, aren’t findable, have poor copy or images, or are just not popular. It’s worth extending ABC analysis with an additional letter, i.e. D, to include these non-sellers so they can be fixed or cleared.

  • Non-Pareto distributions: While the Pareto principle usually applies, it does not in every case. However, there’s a really simple workaround - just adjust the positions of the classifications to suit the distribution of sales to match. There’s no written rule that you have to use 80/20.

Does ABC analysis have other applications?

Yes, in B2B ecommerce the technique can also be applied to customers. Like products, customers in B2B ecommerce often also follow the Pareto principle, with a small number of top customers generating the bulk of sales, and a long tail of smaller ones generating the rest.

Customers can be scored in similar ways to products using ABC, or the model can be easily extended to allow it to cope for lapsed customers (i.e. A-E for active customers and F for lapsed customers). Like RFM scores, ABC scores are a practical and easy to use label which is easy for non-technical staff to understand data that would otherwise go over their heads.

Freight containers going to a warehouse.

Single-parameter ABC inventory classification

Extract your product data

In the standard single parameter ABC inventory classification approach, the sum of revenue over the past 12 months (or ADU) is used to calculate the class for each product according to its revenue contribution. Load up Pandas and SQLAlchemy and write a SQL query to extract the data from your database. For speed, I’m performing the calculation in the SQL query rather than in Pandas. Note that the data are ordered in descending order of revenue generated.

import pandas as pd
from sqlalchemy import create_engine
 
engine = create_engine('mysql+pymysql://root:password@172.17.0.3:3306/database_name')

query = """
SELECT 
    order_items.code AS sku, 
    order_items.title AS name,
    SUM(order_items.price) AS revenue,
    SUM(order_items.quantity) AS quantity
FROM order_items
LEFT JOIN orders ON order_items.order_id = orders.id
WHERE order_items.code NOT LIKE ''
AND order_items.code NOT LIKE '-' 
AND orders.date_created BETWEEN DATE_SUB(NOW(), INTERVAL 365 DAY) AND NOW()
GROUP BY order_items.code
ORDER BY SUM(order_items.price) DESC
"""

df = pd.read_sql(query, con=engine)
df.head()
sku name revenue quantity
0 SKU1 Product 1 108822.69 2193.0
1 SKU2 Product 2 96118.36 973.0
2 SKU3 Product 3 81076.77 1847.0
3 SKU4 Product 4 61877.41 515.0
4 SKU5 Product 5 54746.32 635.0

Since these data change on a regular basis, you will need to set up a data pipeline to ensure the classifications are recalculated regularly. Creating a Directed Acyclic Graph in Apache Airflow would be a sensible solution for tackling this.

Calculate cumulative sums

The ABC classes are calculated by calculating the cumulative sum of revenue for each product and then calculating its the running percentage of revenue contribution. For example, the top product here generated £108,822 and made up 1.38% of the total revenue. The top five products generated 5.11% of the total revenue for the business. In the next step, we’ll classify the products generating the top 80% of running revenue percentage to Class A.

df['revenue_cumsum'] = df['revenue'].cumsum()
df['revenue_total'] = df['revenue'].sum()
df['revenue_running_percentage'] = (df['revenue_cumsum'] / df['revenue_total']) * 100
df.head()
sku name revenue quantity revenue_cumsum revenue_total revenue_running_percentage
0 SKU1 Product 1 108822.69 2193.0 108822.69 7875260.39 1.381830
1 SKU2 Product 2 96118.36 973.0 204941.05 7875260.39 2.602340
2 SKU3 Product 3 81076.77 1847.0 286017.82 7875260.39 3.631852
3 SKU4 Product 4 61877.41 515.0 347895.23 7875260.39 4.417571
4 SKU5 Product 5 54746.32 635.0 402641.55 7875260.39 5.112740

Assign classes and ranks

Now we have the values we need to perform the classification we can assign each SKU to its ABC Class. As ABC inventory classification is based on the Pareto principle, this dictates that 20% of your SKUs will contribute 80% of your revenue, so you should assign the SKUs that generate 80% of your revenue to Class A, and the rest to B and C.

However, this doesn’t apply to every site I’ve worked on, so I’d recommend you check rather than blindly using this as a guide. To demonstrate the standard 80/20 rule approach, I’m going to assign the products generating the first 80% of the revenue to Class A. Class B will comprise the products with 80-90%, while Class C includes the bottom 10%. Since products can be at the top or bottom of a Class, we’ll also give each of them a rank so staff can prioritise them within classes.

def classify_product(percentage):
    """Apply an ABC classification to each product based on 
    its ranked percentage revenue contribution. Any split 
    can be used to suit your data. 
    
    :param percentage: Running percentage of revenue contributed
    :return: ABC inventory classification
    """
    
    if percentage > 0 and percentage <= 80:
        return 'A'
    elif percentage > 80 and percentage <= 90:
        return 'B'
    else:
        return 'C'
df['abc_class'] = df['revenue_running_percentage'].apply(classify_product)
df['abc_rank'] = df['revenue_running_percentage'].rank().astype(int)
df.head()
sku name revenue quantity revenue_cumsum revenue_total revenue_running_percentage abc_class abc_rank
0 SKU1 Product 1 108822.69 2193.0 108822.69 7875260.39 1.381830 A 1
1 SKU2 Product 2 96118.36 973.0 204941.05 7875260.39 2.602340 A 2
2 SKU3 Product 3 81076.77 1847.0 286017.82 7875260.39 3.631852 A 3
3 SKU4 Product 4 61877.41 515.0 347895.23 7875260.39 4.417571 A 4
4 SKU5 Product 5 54746.32 635.0 402641.55 7875260.39 5.112740 A 5

Using an 80%, 10%, 10% revenue split for our inventory classification we have 1158 SKUs in Class A, 727 in Class B, and 3379 in Class C, which will be useful for prioritisation. We could consider adjusting the position of the Class bounds to reduce the volume in Class A, as 1158 SKUs is quite a lot to manage closely. Setting the position of Class A to 60% instead of 80% reduces its size to 490 and increases Class B to 1503.

Based on the original 80% split, of the 5264 SKUs sold in the past 12 months, 22% of them were Class A, about 14% were Class B, and 64% were in Class C. The Class C products represent the “long tail” of SKUs that sell in small volumes, but that may be among the SKUs the business needs to stock to ensure customers shop there at all.

df.groupby('abc_class').agg(
    total_skus=('sku', 'count'),
    total_quantity=('quantity', sum),
    total_revenue=('revenue', sum)
)
total_skus total_quantity total_revenue
abc_class
A 1158 190374.0 6299684.93
B 727 41237.0 787485.76
C 3379 80576.0 788089.70

Multicriteria inventory classification

Rather than calculating the ABC classifications manually, as we did above, you can also classify customers using unsupervised learning algorithms, such as K-means clustering. This also allows you to create a simple multi-criteria classification if you feed in some additional parameters. Next we’ll create a really simple multicriteria classification model to show the basic concepts using an unsupervised learning algorithm.

Create and fit a K-means model

First, import the KMeans cluster algorithm from SciKit-Learn’s cluster module, select the numeric columns from your dataframe and create a KMeans model with three centroids, then fit the model to your data.

from sklearn.cluster import KMeans

data = df[['revenue', 'quantity', 'revenue_cumsum', 'revenue_running_percentage']]

kmeans = KMeans(n_clusters=3)
kmeans.fit(data)
KMeans(n_clusters=3)

Map classes to cluster labels

Next, extract the cluster ID from labels_ and assign it to each SKU in the dataframe. As we made three clusters, and numbers count up from zero in Python, the clusters will be numbered, 0, 1, and 2. We can use a lambda function to apply() a dictionary mapping of our cluster numbers to their corresponding ABC classes and assign that to a new column called class_kmeans.

data.loc[:, 'cluster'] = kmeans.labels_
class_map = {0: "C", 1: "B", 2: "A"}
data["class_kmeans"] = data["cluster"].apply(lambda x: class_map[x])

Examine the classifications

Finally, we can compare the quantities of products placed in the ABC classes using K-means clustering using a groupby() function with an aggregation. The unsupervised learning approach gives a similar result to the structured method, with 1126 in Class A (versus 1216), 428 in Class B (versus 777) and 4135 in Class C (versus 3696.)

data.groupby('cluster').agg(
    total_skus=('cluster', 'count'),
    total_quantity=('quantity', sum),
    total_revenue=('revenue', sum)
)
total_skus total_quantity total_revenue
class_kmeans
A 1126 124089.0 3207595.12
B 428 174820.0 6501785.93
C 4135 145352.0 1677943.01

It would be worth examining the two classifications on your own data and experimenting with the addition of additional criteria, just in case the standard single criteria classification doesn’t work on your particular product inventory.

Further reading

Douissa, M.R. and Jabeur, K., 2016, January. A New Model for Multi-criteria ABC Inventory Classification: PROAFTN Method. In KES (pp. 550-559).

Ramanathan, R., 2006. ABC inventory classification with multiple-criteria using weighted linear optimization. Computers & Operations Research, 33(3), pp.695-700.

Teunter, R.H., Babai, M.Z. and Syntetos, A.A., 2010. ABC classification: service levels and inventory costs. Production and Operations Management, 19(3), pp.343-352.

Matt Clarke, Thursday, March 04, 2021

Matt Clarke Matt is a Digital Director who uses data science to help in his work. He has a Master's degree in Internet Retailing (plus two other Master's degrees in different fields) and specialises in the technical side of ecommerce and marketing.

Joining Data with pandas

Learn to combine data from multiple tables by joining data together using pandas.

Start course for FREE

Comments