Customer Lifetime Value or CLV (also erroneously called Lifetime Customer Value or Lifetime Value) is one of the most misunderstood of all marketing metrics. Weirdly, everyone in marketing understands its importance, but few can define it sufficiently to measure it correctly. The exact same thing applies to customer retention, which is also commonly misunderstood by most marketers.
Interestingly, there’s no universally agreed formula or calculation for calculating Customer Lifetime Value. Neither the Marketing Accountability Standards Board (MASB) or the Marketing Metric Audit Protocol (MMAP) has published an independently audited methodology for its calculation, so you’ll see various ways to achieve a similar result.
As there’s no agreed definition or calculation among marketers, when presented with a Customer Lifetime Value metric from a marketer, or a marketing tool, you can never really be sure what they’re reporting. That problem is amplified if you work in a non-contractual business setting, such as an ecommerce site, where the accurate prediction of CLV is even more challenging.
There are various definitions for Customer Lifetime Value but the most commonly accepted one is that CLV is the “discounted value of future profits that a customer yields to the company.” That is, it’s a prediction of the amount of profit each customer will generate in the future, not a metric measuring their current behaviour. Nor an aggregate measure of all customers.
Customer Lifetime Value is the discounted value of future profits a customer yields to the company.
In order to calculate or predict CLV, you need to be able to do two things: predict the future number of transactions each customer will make, and predict how much profit you’ll generate from them. That’s not easy and most marketers can’t build the complex predictive models required to do so.
The other important keyword you’ll see in most definitions of CLV is “discount” or “discounted value.” Most marketers won’t have a clue what that means, and assume it’s something to do with discounts from reduced product prices or promotions, but it’s not. In this context, “discount” actually refers to an interest rate used to calculate the value of cash flow, not a monetary discount from the goods.
Given that you also need to know which of your customers are going to come back to determine their retention, as well as how much they’ll spend, and how much it will cost you to get them to do so, you can see why calculating CLV is actually not trivial.
That’s why most marketers, and many off-the-shelf marketing platforms, using a simplified metric or heuristic and call it CLV instead of actually calculating the true CLV. These usually measure the current customer value instead of predicting future Customer Lifetime Value, which technically isn’t the same thing, though can still be useful to monitor and understand.
If you simply look at two of the variables required to calculate CLV - the expected value of a customer’s future orders and the number of orders they’ll make - you might figure out that CLV is going to be easier to calculate in a contractual setting, where there’s a clearly defined start and end to a customer’s relationship, and you know their future spend.
In a non-contractual business, such as an ecommerce or traditional retail setting, you can never really be sure how many of your customers are still customers, unless you have a model to tell you this. Therefore, in order to calculate CLV you first need to understand customer retention or customer churn, for which you’ll require a churn model.
Contractual churn models, such as those for subscription businesses, are easier to create than non-contractual churn models because the time of a customer’s “death” (in CLV terms) is known. Therefore, modern approaches to predicting CLV combine machine learning techniques to predict churn or attrition in customers.
One of the most popular modern methods for predicting Customer Lifetime Value is the Beta Geometric Negative Binomial Distribution or BG/NBD model. This is used to predict the number of future purchases each customer will make. When combined with another model, the Gamma-Gamma model to predict how much each customer will spend, you can calculate CLV, even in a more complex non-contractual business setting.
Calculating CLV using Gamma-Gamma and BG/NBD isn’t that simple and goes way beyond the skills of most marketers and may not be known to the developers of many marketing platforms. Therefore, when faced with the problem of calculating a metric they don’t really understand, most people will need to resort to a simple alternative called a heuristic instead.
Heuristic techniques are designed to be practical but are also known to be suboptimal, imperfect, or flawed. Nonetheless, they are often sufficient to provide a very rough steer in the absence of the proper metric. You’ll often see or hear these heuristic metrics described as “CLV”, but in reality they’re nothing of the sort.
Sometimes they don’t even have anything to do with CLV in its proper context. CLV is a calculation made at the customer level, and predicts the future, but lots of CLV heuristics are made at the business level and provide an aggregate metric, not an individual one, and they almost never make any predictions of what will happen later.
That said, that doesn’t mean these alternative metrics are pointless. In the total absence of the actual CLV, they’re undoubtedly better than measuring nothing and do still allow marketers to optimise something useful. In this project, we’ll look at how you can calculate some simple Customer Lifetime Value heuristics to use instead of calculating CLV the correct way via a model.
First, open a Jupyter notebook and import the Pandas and Numpy packages.
import pandas as pd import numpy as np
Next, import a transaction item dataset. You can use the Online Retail dataset if you don’t have one to hand. The dataset includes the
customer_id for each line item, plus the
order_date. We need to cast the
order_date to a datetime with Pandas, and create a new column to hold the
order_items = pd.read_csv('online_retail.csv') order_items['line_price'] = order_items['quantity'] * order_items['unit_price'] order_items['order_date'] = pd.to_datetime(order_items['order_date'])
Next, we need to create an
orders dataframe from our
order_items dataframe using a Pandas
groupby() with an
agg() function. This gives us aggregated data on each order in the dataset.
orders = order_items.groupby('order_id').agg( customer_id=('customer_id', 'max'), order_date=('order_date', 'max'), revenue=('line_price', 'sum'), items=('quantity', 'sum'), skus=('sku', 'nunique') ).reset_index()
Now we’ll repeat the process but will use
agg() to create a
customers dataframe. We’ll calculate the date of each customer’s first and last order and store these. We’ll then calculate the
tenure of each customer by return a time delta of the two dates.
customers = order_items.groupby('customer_id').agg( revenue=('line_price', 'sum'), orders=('order_id', 'nunique'), skus=('sku', 'nunique'), items=('quantity', 'sum'), first_order_date=('order_date', 'min'), last_order_date=('order_date', 'max') ).reset_index().sort_values(by='revenue', ascending=False)
customers['tenure'] = round((customers['last_order_date'] - customers['first_order_date']) / np.timedelta64(1, 'D') ).astype(int)
|1703||14646.0||279489.02||77||703||196719||2010-12-20 10:09:00||2011-12-08 12:12:00||353|
|4233||18102.0||256438.49||62||151||64122||2010-12-07 16:42:00||2011-12-09 11:50:00||367|
|3758||17450.0||187482.17||55||127||69029||2010-12-07 09:23:00||2011-12-01 13:29:00||359|
|1895||14911.0||132572.62||248||1794||77180||2010-12-01 14:05:00||2011-12-08 15:54:00||372|
|55||12415.0||123725.45||26||444||77242||2011-01-06 11:12:00||2011-11-15 14:22:00||313|
Firstly, we’ll calculate the overall average order value or AOV for our dataset, which is done by dividing the total number of orders by the total revenue for the period we want to examine. Running the function shows us that we have a £376.36 AOV, which is very healthy.
def get_avg_order_value(revenue, orders): """Return the Average Order Value (AOV) Args: revenue (float): Total revenue orders (int): Total orders Return: avg_order_value (float): Average order value """ return revenue / orders
avg_order_value = get_avg_order_value(orders['revenue'].sum(), orders['order_id'].nunique()) avg_order_value
Next, we’ll calculate the average order frequency. That is, the average number of orders placed by each customer in the dataset. This is simply the total number of orders divided by the total number of customers. Each customer placed an average of 5.92 orders, which is very high.
def get_avg_order_frequency(orders, customers): """Return the Average Order Frequency (AOF) Args: orders (int): Total orders customers (int): Total customers Return: avg_order_frequency (float): Average Order Frequency (AOF) """ return orders / customers
avg_order_frequency = get_avg_order_frequency(orders['order_id'].nunique(), orders['customer_id'].nunique()) avg_order_frequency
Average customer value shows us how much each customer spent on average and is simply the total revenue divided by the total number of customers. For this dataset that’s an impressive £2229.58.
def get_avg_customer_value(revenue, customers): """Return the Average Customer Value. Args: revenue (float): Total revenue customers (int): Total customers Return: avg_customer_value (float): Average order value """ return revenue / customers
avg_customer_value = get_avg_customer_value(orders['revenue'].sum(), orders['customer_id'].nunique()) avg_customer_value
Average customer lifetime is basically the sum of all customer lifetimes (the amount of time between the acquisition and attrition of each customer) divided by the total number of customers, so this is where it gets messy.
Most of the heuristics for calculating CLV require you to know the average customer lifetime and are designed for contractual businesses, not non-contractual business, like retail. In the latter, you are never sure whether a customer is still a customer, so this is never going to be accurate.
def get_avg_customer_lifetime(total_lifetimes, customers): """Return the Average Customer Lifetime. This is for *contractual* businesses only. Args: total_lifetimes (float): The sum of total customer lifetimes customers (int): Total customers Return: avg_customer_lifetime (float): The average lifetime of each customer in days """ return total_lifetimes / customers
In the absence of the actual lifetime data, the closest thing we can use is their
tenure. That’s flawed, of course, but we don’t have anything else to use. We get back a figure of 133 days, which clearly isn’t going to be correct for this non-contractual business. You’d easily be able to calculate this in a contractual setting though.
avg_customer_lifetime = get_avg_customer_lifetime(customers['tenure'].sum(), customers['customer_id'].nunique()) avg_customer_lifetime
Finally, we can calculate a heuristic for the average “Customer Lifetime Value” by multiplying the average customer value by the average customer lifetime in years. That gives us a value of £816.
Obviously, this is flawed because we don’t know the actual customer lifetime duration in a non-contractual business, and also because it’s an aggregate measure, not one specific to each customer… However, it’s the sort of thing you’ll often see tools mispresent as “CLV”, even though it’s not.
def get_avg_clv(avg_customer_value, avg_customer_lifetime): """Return the average Customer Lifetime Value. This is NOT CLV. Args: avg_customer_value (float): Average customer value avg_customer_lifetime (float): Average customer lifetime in days Return: avg_clv (float): Average Customer Lifetime Value """ return avg_customer_value * (avg_customer_lifetime / 365)
avg_clv = get_avg_clv(avg_customer_value, avg_customer_lifetime) avg_clv
While these values aren’t CLV, and still aren’t that easy to calculate accurately in non-contractual retail settings, they’re better than nothing. To calculate CLV correctly, I’d highly recommend that instead you use a more robust methodology such as the combination of BG/NBD with Gamma-Gamma.
Matt Clarke, Wednesday, September 07, 2022