How to calculate category management metrics in Python

Category management metrics can let you understand product sales and be more strategic in your pricing and product launches. Here’s how to calculate them.

How to calculate category management metrics in Python
Picture by Hanson Lu, Unsplash.
14 minutes to read

Category management is a retail technique that breaks down a company’s product range into groups of related items, such as categories, or subcategories, or by their product type. By running each category as its own miniature business unit, with its own budgets, targets, and strategies, the business can ensure that each part of the product portfolio gets attention and isn’t inadvertently neglected.

While not every business in the ecommerce sector has the benefit of trained category managers, there are nonetheless a range of useful metrics that can be borrowed from the field of category management to help ecommerce staff better monitor and optimise their product sales. Here are a selection of some of the most useful, and the code you need to calculate them using Python.

Product trial rate

Product trial rate, or just trial, measures the proportion of purchasers who purchased a product for the first time during a period to try it out. Of course, whether a product is actually being trialled depends on the typical repurchase rate for its product type.

As I’ve explained before, different product types have different repurchase rates, and some are rarely ever purchased for replenishment, so you need to understand the type of product you’re examining when you use this metric.

However, for those products which are replenished, trial rate is a useful measure, especially during the New Product Development (NPD) and product launch process, where marketing and ecommerce will want to get customers to try a new product or switch from a less expensive or less profitable brand.

A low trial rate either suggests that the product type is one that is commonly replenished, or that customers aren’t being persuaded to purchase. That could be because it’s too expensive, hasn’t been marketed to them, or because it’s just not attractive.

def product_trial_rate(number_of_first_time_purchases, total_purchasers):
    """Returns the percentage of customers who trialled a product for the first time during a period.

    Args:
        number_of_first_time_purchases (int): Total number of unique first-time purchasers during a period.
        total_purchasers (int): Total number of unique purchasers during a period.

    Returns:
        Percentage of customers who purchased product for the first time during a period.
    """

    return (number_of_first_time_purchases / total_purchasers) * 100
print('Product trial rate:', product_trial_rate(number_of_first_time_purchases=45, 
                                                total_purchasers=345))
Product trial rate: 13.043478260869565

Product repurchase rate

Product repurchase rate is the natural companion to Product trial rate and looks at the percentage of customers who purchased a given SKU for the second time or more. Product repurchase rate can tell you numerous things, so is well worth monitoring.

It can tell you whether the product type is likely to be replenished at all - most people only ever buy one model of car, fridge, or phone, and never replace it with a like-for-like match, or whether they liked the product, price, and service enough to buy it again. It’s a really useful metric to monitor across products, and monitor over time, since changes in price, availability, or “findability”, can impact it.

def product_repurchase_rate(number_of_repeat_purchasers, total_purchasers):
    """Returns the percentage of customers who purchased a product for the second time or more.

    Args:
        number_of_repeat_purchasers (int): Total number of unique repeat purchasers during a period.
        total_purchasers (int): Total number of unique purchasers during a period.

    Returns:
        Percentage of customers who purchased product for the second time or more during a period.
    """

    return (number_of_repeat_purchasers / total_purchasers) * 100
print('Product repurchase rate:', product_repurchase_rate(number_of_repeat_purchasers=321, 
                                                          total_purchasers=345))
Product repurchase rate: 93.04347826086956

Product consumption rate

Product consumption rate measures the typical purchase quantities for each SKU. Obviously, this varies from product to product. Most people will only ever require a single laptop, phone, or kettle, at any one time, so these have low rates of consumption. However, bananas, apples, and potatoes, are almost always purchased at volume.

Product consumption is impacted by the product type, the customer need, the product price, the product perishability, the product availability, promotions, despatch speed, and panic buying. Understanding consumption rates allows you to determine whether an average quantity of, say, 1.5 units per order is good or not, for each given product type.

def product_consumption_rate(total_items, total_orders):
    """Returns the average number of units per order.

    Args:
        total_items (int): Total number of items of a SKU sold during a period.
        total_orders (int): Total number of orders during a period.

    Returns:
        Average number of units per order. 
    """

    return (total_items / total_orders) * 100
print('Product consumption rate:', product_consumption_rate(total_items=1724, 
                                                           total_orders=9389))
Product consumption rate: 18.36191287677069

Brand usage

Ordinarily, brand usage index data is derived from customer surveys which seek to determine what percentage of customers use certain brands. However, equivalent data can also be obtained from ecommerce data warehouses, at least for the brands a retailer already sells.

Here, brand usage measures the number of brand purchasers over the total number of purchasers in a period, usually within a specific product category. It’s a good way to understand brand popularity and is usually measured alongside product trial rate, product repurchase rate, and product consumption rate, to get a more detailed picture of how well a brand sells.

def brand_usage(number_of_brand_users, total_purchasers):
    """Returns the percentage of brand usage for a period.

    Args:
        number_of_brand_users (int): Total number of unique users of a brand in a period.
        total_purchasers (int): Total unique purchasers in a period.

    Returns:
        Percentage of purchasers who used brand X in the period.
    """

    return (number_of_brand_users / total_purchasers) * 100
print('Brand usage:', brand_usage(number_of_brand_users=100, 
                                  total_purchasers=345))
Brand usage: 28.985507246376812
Brand penetration rate

Brand penetration rate, or simply penetration rate, is confusingly similar to brand usage, but measures the percentage of people who have purchased a brand at some point within the period examined, rather than those currently using it. It’s designed to show what proportion of customers have trialled or used a brand at some point.

Like brand usage, it’s normally calculated using survey data, but a surrogate can be obtained using sales data and setting different time periods. It’s often used for broader studies, such as examining what percentage of households buy coffee, or use Cilit Bang, or whatever.

def brand_penetration_rate(number_of_brand_purchasers, total_purchasers):
    """Returns the percentage of penetration rate for a brand.

    Args:
        number_of_brand_purchasers (int): Total number of unique purchasers of a brand.
        total_purchasers (int): Total unique purchasers.

    Returns:
        Percentage of purchasers who have purchased the brand.
    """

    return (number_of_brand_purchasers / total_purchasers) * 100
print('Brand penetration rate:', brand_penetration_rate(number_of_brand_purchasers=200, 
                                                        total_purchasers=1000))
Brand penetration rate: 20.0
Product turnover

Product turnover, or sell-through rate, is calculated based on the number of units sold within a period over the average stock holding for the product. It’s supposed to be used as an indicator of product demand. It’s commonly used, but to be honest, it’s not a metric I find to be that useful.

def product_turnover(units_sold_in_period, average_items_stocked_in_period):
    """Return the product turnover (or sell through rate) for a 
    product based on units sold versus items stocked.

    Args:
        units_sold_in_period (int): Number of units of product X sold in the period.
        average_items_stocked_in_period (int): Average stock holding for product X in the period.

    Returns:
        product_turnover (float): Percentage of average stock holding sold during the period.
    """

    return (units_sold_in_period / average_items_stocked_in_period) * 100
print('Product turnover:', product_turnover(units_sold_in_period=50, 
                                            average_items_stocked_in_period=100))
Product turnover: 50.0
Price Index

The Price Index shows the price relationship of product X over the price of product Y. The point of this metric is to help marketers identify the market positioning of the product in relation to other products of a similar type. For example, is the product “Good”, “Better”, or “Best”, in GBB terms?

This is important for a couple of reasons. Firstly, if the procurement team has provided a setup sheet for a new product launch to the ecommerce team, and the product has a price index placing it in the “Best” segment, what are the product features that mean it’s so expensive and how can the ecommerce copywriters best sell these as benefits? As a result, it can also ensure the price is set correctly so that it generates sales.

Secondly, it makes a difference when providing product recommendations. Customers often tend to buy products of similar GBB categories, so if they usually go with the cheapest brands, the probability of them going with a premium item are going to be lower. As a result, it’s useful to calculate it against other products, and the average price of products within a specific category.

def price_index(price_of_product_x, price_of_product_y):
    """Return the price index of product X over product Y.

    Args:
        price_of_product_x (float): Price of product X.
        price_of_product_y (float): Price of product Y.

    Returns:
        price_index (float): Price of X / Price of Y
    """

    return (price_of_product_x / price_of_product_y) * 100
print('Price Index:', price_index(price_of_product_x=10.99, price_of_product_y=15.99))
Price Index: 68.73045653533458
Purchase intention

Purchase intention is another useful metric for ecommerce. Ordinarily, in the offline retail world, it’s calculated using data from quantitative consumer studies. However, three variants can be calculated easily from web analytics data, and a couple are already included within Google Analytics.

If you consider an online retail website, the way to increase sales is to get people onto the site, get them to look at a product, add it to their cart, start the checkout process, and then complete their order. Therefore, the first measure of purchase intention you can use is the percentage of sessions which included a product page view.

Next, you can measure the percentage of people who viewed the product page and added the item to their cart (or the cart-to-detail rate), then you can examine the percentage who added to cart and reached the checkout (or the buy to detail rate), before finally examining the conversion rate for the SKU. You end up with a nice little funnel of metrics showing purchase intention and where customers drop out.

def purchase_intention(people_who_declared_interest, total_people):
    """Returns the purchase intention rate for a product.

    This can be used for cart-to-detail, buy-to-detail, and similar calculations.

    Args:
        people_who_declared_interest (int): Number of people who declared interest in a product.
        total_people (int): Total number of people.

    Returns:
        Percentage of people who were interested in a product.
    """

    return (people_who_declared_interest / total_people) * 100
print('Purchase intention:', purchase_intention(people_who_declared_interest=100, 
                                                total_people=1000))
Purchase intention: 10.0
Product satisfaction

Finally, there’s product satisfaction. Alongside the product return rate, this one tells you which of the customers who kept the product they purchased were actually happy with it.

It can be calculated in a couple of ways. The easiest is to count the total number of “positive” star ratings for a product received during a period (i.e. those who rated it 4 or 5 out of 5). However, you can also use NLP to better understand the actual reasons why they liked, or didn’t like, the product.

def product_satisfaction(total_reviews, positive_reviews):
    """Return the product satisfaction score for a period.

    Args:
        total_reviews (int): Total number of reviews received within the period.
        positive_reviews (int): Total number of positive reviews received within the period.

    Returns:
        Percentage (float) of positive reviews received.
    """

    return (positive_reviews / total_reviews) * 100
print('Product satisfaction:', product_satisfaction(total_reviews=1000, 
                                                    positive_reviews=900))
Product satisfaction: 90.0

Matt Clarke, Saturday, March 13, 2021

Matt Clarke Matt is a Digital Director who uses data science to help in his work. He has a Master's degree in Internet Retailing (plus two other Master's degrees in different fields) and specialises in the technical side of ecommerce and marketing.

Joining Data with pandas

Learn to combine data from multiple tables by joining data together using pandas.

Start course for FREE

Comments