Calculating Customer Lifetime Value or CLV is considered a really important thing in marketing and ecommerce, yet most companies can’t do it properly. This clever metric tells you the predicted value each customer will bring to your business over their lifetime, and as such requires the ability to detect which customers will churn as well as what they’re likely to spend if they’re retained.
Along with the RFM model, CLV is one of the most widely used models in marketing science today. Both are extremely useful for customer segmentation and a range of other ecommerce data science projects.
CLV can be used to help you focus your attention on retaining the most important customers (and acquiring others like them), and it can help you forecast what you’ll get from your customers if you can retain them. It’s also really useful for demonstrating the bigger picture to non-marketers who may think only of the operational costs of resolving issues, instead of the cost of losing a valuable customer.
In a contractual churn setting, like a mobile phone network or a broadband provider, you can tell how many of your customers are still customers (or “alive” as they’re called in the world of CLV), because you know how many open contracts you have. You observe the time at which they “die” because they fail to renew their contract and you get time to try and prevent them churning because you know the point at which they might die.
In non-contractual churn settings, like most ecommerce businesses, the time of a customer’s “death” isn’t directly observed, so you never really know which ones are alive or dead. That makes determining how many active customers you have much, much harder. It’s therefore little surprise that CLV isn’t measured as often as you might think, especially in smaller ecommerce businesses.
In many businesses, you’ll hear management say that their customers are considered active if they have made a purchase within an arbitrary time period, such as one to two years. However, if you’ve ever examined customer data, you’ll realise why this approach is flawed.
Customers often vary dramatically in their behaviour - some order weekly and others order annually. There’s no such thing as “typical customer.” If a customer normally orders from you every week and hasn’t shopped for a year, do you really think they’re still alive?
Here we’re going to use two different models - the Beta Geometric Negative Binomial Distribution or BG/NBD model and the Gamma-Gamma model - to predict the following:
We need quite a few packages for this project. We’ll be using Pandas to load the data, Seaborn and Matplotlib to visualise the data and the excellent Lifetimes package by Cameron Davidson-Pilon to create the CLV model.
import pandas as pd import seaborn as sns import matplotlib.pyplot as plt from matplotlib.pyplot import figure from lifetimes import BetaGeoFitter from lifetimes.utils import calibration_and_holdout_data from lifetimes.utils import summary_data_from_transaction_data from lifetimes.plotting import plot_frequency_recency_matrix from lifetimes.plotting import plot_probability_alive_matrix from lifetimes.plotting import plot_period_transactions from lifetimes.plotting import plot_history_alive from lifetimes.plotting import plot_calibration_purchases_vs_holdout_purchases import warnings warnings.filterwarnings('ignore')
Calculating CLV only requires standard transactional data, comprising the order ID, the customer ID, the total
value of the order and the order date, all of which are easily obtainable. Since we need to perform some date
manipulations, we’ll need to ensure that the order date field has been set to a datetime object, which we can do
to_datetime() Pandas function.
df_orders = pd.read_csv('data/transactions.csv') df_orders['date_created'] = pd.to_datetime(df_orders.date_created) df_orders.head()
The next step is to turn our raw transactional data into the recency, frequency, monetary and tenure (T) metrics we need to provide to the CLV models. While it’s relatively straightforward to do this manually using Pandas, there’s a really useful helper function in the Lifetimes package to do it for you called
You pass the function the dataframe containing your transactional data, define the columns that hold the customer ID,
the order date, and the order value, and set an
observation_period_end date. The
would usually represent the most recent order date within your dataframe. Running the function returns a new
dataframe containing the raw RFM and T metrics.
df_rfmt = summary_data_from_transaction_data(df_orders, 'customer_id', 'date_created', 'total_revenue', observation_period_end='2020-10-01') df_rfmt.head()
Next, it’s a good idea to take a look at the statistical distribution of your data. This dataset is relatively typical for an ecommerce retailer. The data are strongly skewed, with most customers have placed a single low value order, and a long tail of higher value repeat customers contributing the rest.
ax = sns.distplot(df_rfmt['recency'])
ax = sns.distplot(df_rfmt['frequency'])
ax = sns.distplot(df_rfmt['monetary_value'])
T plot shows the tenure of the customers, or when they were acquired, so the peaks and troughs here indicate that this business is strongly seasonal. The higher peaks towards the low end also show that this company has greatly increased its acquisition levels in recent years.
ax = sns.distplot(df_rfmt['T'])
Lifetimes comes with a number of different models you can fit to your data. The first one we’ll try is the Beta Geometric Negative Binomial Distribution or BG/NBD model. This model is based on the original Pareto/NBD model for CLV which was formulated by Schmittlein, Morrison, and Colombo in 1987.
This version, which was written by Fader, Hardy, and Lee in 2005, is designed to be just as effective as Pareto/NBD but much easier to implement. It’s become one of the most widely used CLV models around. We can fit the model using the
T data and then view a summary.
bgf = BetaGeoFitter(penalizer_coef=0) bgf.fit(df_rfmt['frequency'], df_rfmt['recency'], df_rfmt['T']) bgf.summary
|coef||se(coef)||lower 95% bound||upper 95% bound|
Now the BG/NBD model has been fitted to the data, we can create predictions and visualise them. The
plot_frequency_recency_matrix() function will do this for you. The matrix has the customer’s recency on the Y axis and the frequency on the X axis and the heatmap component is showing the predicted number of future purchases customers at intersecting points will make in one unit of time. The customers most likely to order are those who’ve placed lots of previous orders and have been seen relatively recently.
figure(num=None, figsize=(10, 10), dpi=80, facecolor='w', edgecolor='k') plot_frequency_recency_matrix(bgf)
<matplotlib.axes._subplots.AxesSubplot at 0x7f10b200d0d0>
The next prediction we can visualise is the probability that each customer is “alive” based on their frequency and recency. We can do the whole thing by running
plot_probability_alive_matrix(bgf) on the model we fitted earlier.
figure(num=None, figsize=(10, 10), dpi=80, facecolor='w', edgecolor='k') plot_probability_alive_matrix(bgf)
<matplotlib.axes._subplots.AxesSubplot at 0x7f10b1fef820>
Personally, I find the raw data much more interesting and useful than the visualisations, so let’s use the BG/NBD model data to predict the number of purchases each customer will make in some forthcoming periods.
We can do this by using the
conditional_expected_number_of_purchases_up_to_time() and passing in the number of days to predict in the future
frequency data, the
recency data and the
T data showing tenure. We’ll set
t to 90 so we predicted for the 90 days after the end of the observation period.
t = 90 df_rfmt['predicted_purchases'] = bgf.conditional_expected_number_of_purchases_up_to_time(t, df_rfmt['frequency'], df_rfmt['recency'], df_rfmt['T'])
If you print the
tail() of the dataframe to which we assigned the
predicted_purchases value from the model, you’ll see the customers ranked by the predicted number of orders they’re likely to place in the specified period. We’ve got a number of customers in this dataset who should place quite a few orders in the coming period. At the other end of the dataframe we’ll have a load who won’t.
predicted_purchases value shows we have a really long tail, but that most customers in the dataset aren’t going to order in the next 90 days. Obviously, this is based only on the customers in the dataset, so this prediction wouldn’t include any customers acquired since then or during the 90 day period, so will be higher in reality.
ax = sns.distplot(df_rfmt['predicted_purchases'])
Next we can assess how well the BetaGeoFitter BG/NBD model fits to our data. The
plot_period_transactions() function will compare the actual observations from a previous time period to the model’s prediction for that time period. As you can see, the two are almost identical, which shows that the model is a very good fit and predicts the number of periods in the calibration period rather well.
<matplotlib.axes._subplots.AxesSubplot at 0x7f10b226a1f0>
The other, more robust, approach to making predictions is to use a calibration period and a holdout or observation period. The calibration period starts from the date of the first transaction in your dataset and ends here on
2020-06-01. Immediately after this period the observation period starts, and this ends on
2020-10-01. We therefore have several years of data in the calibration period and five months in the observation period. If you run the
calibration_and_holdout_data() it will partition the data like this for you.
summary_cal_holdout = calibration_and_holdout_data(df_orders, 'customer_id', 'date_created', calibration_period_end='2020-06-01', observation_period_end='2020-10-01')
As you’ll see below, we’ve now got a partitioned dataset. For each customer we’ve got
_cal suffixed frequency, recency, and tenure metrics which contain data from the calibration period, plus a
frequency_holdout value containing the actual number of orders they placed in the holdout period.
Now we’ve got a dataset that’s partitioned into a calibration and validation or holdout period, we can re-fit the model just to the calibration data columns,
T_cal. The model now hasn’t seen any of the data for the holdout period.
bgf.fit(summary_cal_holdout['frequency_cal'], summary_cal_holdout['recency_cal'], summary_cal_holdout['T_cal'])
<lifetimes.BetaGeoFitter: fitted with 145727 subjects, a: 0.48, alpha: 33.72, b: 0.83, r: 0.11>
Now that’s refitted, we can run
plot_calibration_purchases_vs_holdout_purchases() to plot the predicted frequency of orders in the holdout period to the actual value. As you can see, the lines are pretty close together. The model is not far off at predicting the number of orders each customer will make.
<matplotlib.axes._subplots.AxesSubplot at 0x7f10b218f310>
To get a feel of how good the model is, you can also try adjusting the length of the observation period or holdout group and refitting the model to the new data. Here, rather than using a five month observation period, I’ve set it to one year to see how well the model copes with a longer period.
from lifetimes.utils import calibration_and_holdout_data from lifetimes.plotting import plot_calibration_purchases_vs_holdout_purchases summary_cal_holdout = calibration_and_holdout_data(df_orders, 'customer_id', 'date_created', calibration_period_end='2019-10-01', observation_period_end='2020-10-01') bgf.fit(summary_cal_holdout['frequency_cal'], summary_cal_holdout['recency_cal'], summary_cal_holdout['T_cal'])
<lifetimes.BetaGeoFitter: fitted with 91360 subjects, a: 0.49, alpha: 37.84, b: 0.83, r: 0.12>
plot_calibration_purchases_vs_holdout_purchases() shows that we get a better result with a longer period (perhaps unsurprisingly). The BG/NBD model is really good at predicting the number of orders customers will make over longer periods, but obviously this is tougher for shorter durations.
<matplotlib.axes._subplots.AxesSubplot at 0x7f10b20269d0>
If you want to generate predictions for specific customers you can simply pass in their index using
iloc and extract their
T values from the calculated data and give them to the model to obtain a prediction. The customer in index 1 should place 2.47 orders in the next 365 days.
t = 365 individual = df_rfmt.iloc bgf.predict(t, individual['frequency'], individual['recency'], individual['T'])
The other neat thing you can do is examine the probability of each customer being alive over time. One of the main concepts of CLV models, like Buy Till You Die (BTYD) is that active customers will purchase “randomly” around their mean transaction rate. If a customer shops every 30 days on average, their orders might come every 20-40 days, for example.
This will vary across customers - some will binge, some will take a break from your business, but most will shop somewhere around the mean while they’re alive. That’s why calculating customer purchase latency is so useful. In the Dropout Process, each customer has an unobserved propensity to dropout or “die”. We don’t know what this figure will be and, like transaction rates, dropout propensities vary across customers.
If we look at customer 436281 from our original dataset, we can see that they’ve placed 15 orders since they were first acquired on 2020-07-22 about 118 days ago. Most of their orders are pretty small, and they’ve had a refund for something, but they come back regularly.
example_customer_orders = df_orders.loc[df_orders['customer_id'] == 436281] example_customer_orders
By visualising the data on the customer above using
plot_history_alive() we can see two things. Firstly, we can
see the typical purchase latency. Their orders do follow the expected trend and they purchase around the mean of
their latency. Secondly, we can see the decay in their probability of being alive that occurs when they pause between orders. The longer the gap gets above their mean, the more likely they are to churn or die.
figure(num=None, figsize=(14, 6), dpi=80, facecolor='w', edgecolor='k') days_since_birth = 118 plot_history_alive(bgf, days_since_birth, example_customer_orders, 'date_created')
<matplotlib.axes._subplots.AxesSubplot at 0x7f10b2197250>
If we now change the
days_since_birth value from the actual length of their tenure (118 days) to a longer period, we’ll see how their probability of being alive plummets. If they don’t order again before the end of the year, the chances of them being alive are very slim.
Either that, or they’re a seasonal customer and they might come back again next year. As you can see, we can predict the likelihood to order and visualise it any point in time with decent accuracy.
figure(num=None, figsize=(14, 6), dpi=80, facecolor='w', edgecolor='k') days_since_birth = 182 plot_history_alive(bgf, days_since_birth, example_customer_orders, 'date_created')
<matplotlib.axes._subplots.AxesSubplot at 0x7f10b20a1c10>
Now that we’ve got the prediction of orders in the bag, let’s take a look at predicting their value. To do that we’ll use the Gamma-Gamma model. This was written by Peter Fader and Bruce Hardie in 2013 and was based on one of Fader’s earlier models for predicting spend.
It’s a bit like BG/NBD in some ways. It assumes that the monetary value of a customer’s transactions will vary randomly around their average order value, and that average order value varies across customers but doesn’t usually vary much within customers and is independent of the transaction process.
We can use the original
df_rfmt dataframe we created earlier for our training data. We’ll filter this to include only the returning customers.
returning_customers = df_rfmt[df_rfmt['frequency'] > 0] returning_customers = df_rfmt[df_rfmt['monetary_value'] > 0] returning_customers.sort_values(by='monetary_value', ascending=False).head()
One really important point about the Gamma-Gamma model is that it assumes that monetary value and frequency are independent variables and aren’t correlated with each other. For this reason, the monetary value you use here is based on the mean value of their orders rather than the total revenue, which obviously would be correlated. You can check that they’re not correlated using the
corr() function which runs a Pearson correlation. The 0.1 scores we get mean we’re good to go.
Next we’ll fit the Gamma-Gamma model to the
returning_customers dataframe we created above. We only need the
monetary_value columns in order to fit the model, as we’re aiming to predict the monetary value of the orders they’ll place.
from lifetimes import GammaGammaFitter ggf = GammaGammaFitter(penalizer_coef = 0) ggf.fit(returning_customers['frequency'], returning_customers['monetary_value'])
<lifetimes.GammaGammaFitter: fitted with 36599 subjects, p: 2.64, q: 3.03, v: 44.66>
If you run the
conditional_expected_average_profit() function on the fitted
ggf model you get back the predicted average order value for each customer. If you have customer-level profit data, you can also fit the model using this data instead of their revenue, providing it’s the mean and is consistently calculated.
predicted_monetary = ggf.conditional_expected_average_profit( returning_customers['frequency'], returning_customers['monetary_value'] )
Now we’ll take the Series returned by the above function and assign it to a dataframe, placing the predicted AOV value in its own column and leaving the
customer_id value in the index. If you sort the
aov column in descending order and reassign the sorted data back to the dataframe, you can then look at the top and bottom of your customer list.
aov_df = pd.DataFrame(predicted_monetary,columns = ['aov']) aov_df = aov_df.sort_values(by='aov', ascending=False)
Here’s a plot of their AOVs, which follows the pattern you’d expect to see on a long-tailed dataset like this one. Most of our customers are going to spend only a modest amount, but there’s a long-tail of higher value customers and a little group of really big spenders on the far right.
ax = sns.distplot(aov_df['aov'])
To get an idea of the spread of AOV across customers we can use quantile-based discretization
or binning via
We can take the dataframe above and create five bins labeled 1 to 5, reset the index, and then create an aggregation to calculate the summary statistics based on the
aov_df['aov_bin'] = pd.qcut(aov_df['aov'], q=5, labels=[1, 2, 3, 4, 5]) aov_df = aov_df.reset_index()
Bin 1 contains the low CLV customers who have an average value of £29.16, the averages go up in each bin, rising to an average of £123.68 in bin 5. However, these include customers from £69.98 to £3936.86, so there’s quite a spread at the top end.
aov_df.groupby('aov_bin').agg( count=('customer_id', 'count'), min_aov=('aov', min), max_aov=('aov', max), std_aov=('aov', 'std'), aov=('aov', 'mean') ).sort_values(by='aov')
Using K-means clustering is another way to get a picture of the data. While the cluster numbers aren’t sorted in order, you can clearly see the differences in AOV across the five clusters. Cluster zero contains low spenders, averaging ~£44 per order, while clusters 1 and 3 include some very big spenders.
from sklearn.cluster import KMeans aov_clusters = aov_df[['aov']] kmeans = KMeans(n_clusters=5) kmeans.fit(aov_clusters) aov_clusters = aov_clusters.assign(cluster=kmeans.labels_) aov_clusters.groupby('cluster')['aov'].mean().sort_values(ascending=False).to_frame()
Finally, we’ll get to the good bit - predicting Customer Lifetime Value. To do this we’ll use both models - the BG/NBD model to predict the number of orders and the Gamma-Gamma model to predict their values. First we’ll re-fit the BetaGeoFitter BG/NBD model to our dataset for the returning customers, which includes the monetary data.
bgf.fit(returning_customers['frequency'], returning_customers['recency'], returning_customers['T'])
<lifetimes.BetaGeoFitter: fitted with 36599 subjects, a: 0.54, alpha: 164.34, b: 0.90, r: 1.65>
Next, we’ll use the Gamma-Gamma model’s
customer_lifetime_value() function to predict their value. We’ll pass in the
bgf model, along with the
T data for each customers. The
time parameter is in months and defines how many months into the future we wish to predict.
If you’ve not worked with CLV before you’re likely to be confused by the
discount_rate argument. In CLV, discount has nothing to do with giving customers money off their orders. Instead, it relates to discounted cash flow (DCF), which is a financial measure for adjusting the cost of capital. This is a monthly adjusted discount rate and uses a default value of 0.01 in this package. You’ll likely need some help in calculating this if you want to set it precisely.
preds = ggf.customer_lifetime_value( bgf, returning_customers['frequency'], returning_customers['recency'], returning_customers['T'], returning_customers['monetary_value'], time = 12, discount_rate = 0.01 ) preds = preds.to_frame().reset_index()
If you sort the dataframe in descending order of the predicted CLV, you’ll see who the model predicts your best customers to be over the next 12 months. There’s one massive one - perhaps a trade customer - plus a number of others who are really lucrative for the business.
At the bottom of the dataframe we’ve got all the customers who are predicted to spend next to nothing. These figures are likely to be so low they won’t even register as whole orders, of course.
Fader, P.S., Hardie, B.G. and Lee, K.L., 2005. “Counting your customers” the easy way: An alternative to the Pareto/NBD model. Marketing science, 24(2), pp.275-284.
Fader, P.S. and Hardie, B.G., 2013. The Gamma-Gamma model of monetary value.
Schmittlein, D.C., Morrison, D.G. and Colombo, R., 1987. Counting your customers: Who-are they and what will they do next?. Management science, 33(1), pp.1-24.
Matt Clarke, Sunday, March 14, 2021
Learn the fundamentals of gradient boosting and build state-of-the-art machine learning models using XGBoost to solve classification and regression problems.Start course for FREE