How to calculate CLV using BG/NBD and Gamma-Gamma

Calculating Customer Lifetime Value is hard to do right. Learn how to calculate CLV using the BG/NBD and Gamma-Gamma models in Python.

How to calculate CLV using BG/NBD and Gamma-Gamma
Picture by Rich Tervet, Unsplash.
39 minutes to read

Calculating Customer Lifetime Value or CLV is considered a really important thing in marketing and ecommerce, yet most companies can’t do it properly. This clever metric tells you the predicted value each customer will bring to your business over their lifetime, and as such requires the ability to detect which customers will churn as well as what they’re likely to spend if they’re retained.

Along with the RFM model, CLV is one of the most widely used models in marketing science today. Both are extremely useful for customer segmentation and a range of other ecommerce data science projects.

CLV can be used to help you focus your attention on retaining the most important customers (and acquiring others like them), and it can help you forecast what you’ll get from your customers if you can retain them. It’s also really useful for demonstrating the bigger picture to non-marketers who may think only of the operational costs of resolving issues, instead of the cost of losing a valuable customer.

How many of your customers are still customers?

In a contractual churn setting, like a mobile phone network or a broadband provider, you can tell how many of your customers are still customers (or “alive” as they’re called in the world of CLV), because you know how many open contracts you have. You observe the time at which they “die” because they fail to renew their contract and you get time to try and prevent them churning because you know the point at which they might die.

In non-contractual churn settings, like most ecommerce businesses, the time of a customer’s “death” isn’t directly observed, so you never really know which ones are alive or dead. That makes determining how many active customers you have much, much harder. It’s therefore little surprise that CLV isn’t measured as often as you might think, especially in smaller ecommerce businesses.

In many businesses, you’ll hear management say that their customers are considered active if they have made a purchase within an arbitrary time period, such as one to two years. However, if you’ve ever examined customer data, you’ll realise why this approach is flawed.

Customers often vary dramatically in their behaviour - some order weekly and others order annually. There’s no such thing as “typical customer.” If a customer normally orders from you every week and hasn’t shopped for a year, do you really think they’re still alive?

Calculating CLV

Here we’re going to use two different models - the Beta Geometric Negative Binomial Distribution or BG/NBD model and the Gamma-Gamma model - to predict the following:

  1. Which customers are still customers
  2. Who will order again in the next period
  3. The number of orders each customer will place
  4. The average order value of each customer’s order
  5. The total amount each customer will generate for the business

Load the packages

We need quite a few packages for this project. We’ll be using Pandas to load the data, Seaborn and Matplotlib to visualise the data and the excellent Lifetimes package by Cameron Davidson-Pilon to create the CLV model.

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from matplotlib.pyplot import figure
from lifetimes import BetaGeoFitter
from lifetimes.utils import calibration_and_holdout_data
from lifetimes.utils import summary_data_from_transaction_data
from lifetimes.plotting import plot_frequency_recency_matrix
from lifetimes.plotting import plot_probability_alive_matrix
from lifetimes.plotting import plot_period_transactions
from lifetimes.plotting import plot_history_alive
from lifetimes.plotting import plot_calibration_purchases_vs_holdout_purchases
import warnings


Load transactional data

Calculating CLV only requires standard transactional data, comprising the order ID, the customer ID, the total value of the order and the order date, all of which are easily obtainable. Since we need to perform some date manipulations, we’ll need to ensure that the order date field has been set to a datetime object, which we can do with the to_datetime() Pandas function.

df_orders = pd.read_csv('data/transactions.csv')
df_orders['date_created'] = pd.to_datetime(df_orders.date_created)
order_id customer_id total_revenue date_created
0 299527 166958 74.01 2017-04-07 05:54:37
1 299528 191708 44.62 2017-04-07 07:32:54
2 299529 199961 16.99 2017-04-07 08:18:45
3 299530 199962 11.99 2017-04-07 08:20:00
4 299531 199963 14.49 2017-04-07 08:21:34

Calculate the raw recency, frequency, and monetary metrics

The next step is to turn our raw transactional data into the recency, frequency, monetary and tenure (T) metrics we need to provide to the CLV models. While it’s relatively straightforward to do this manually using Pandas, there’s a really useful helper function in the Lifetimes package to do it for you called summary_data_from_transaction_data().

You pass the function the dataframe containing your transactional data, define the columns that hold the customer ID, the order date, and the order value, and set an observation_period_end date. The observation_period_end date would usually represent the most recent order date within your dataframe. Running the function returns a new dataframe containing the raw RFM and T metrics.

df_rfmt = summary_data_from_transaction_data(df_orders, 

frequency recency T monetary_value
0 165.0 777.0 779.0 0.000000
6 9.0 1138.0 1147.0 22.045556
34 0.0 0.0 111.0 0.000000
44 12.0 1055.0 1251.0 50.914167
45 0.0 0.0 538.0 0.000000

Examine the statistical distribution of the data

Next, it’s a good idea to take a look at the statistical distribution of your data. This dataset is relatively typical for an ecommerce retailer. The data are strongly skewed, with most customers have placed a single low value order, and a long tail of higher value repeat customers contributing the rest.

ax = sns.distplot(df_rfmt['recency'])


ax = sns.distplot(df_rfmt['frequency'])


ax = sns.distplot(df_rfmt['monetary_value'])


The T plot shows the tenure of the customers, or when they were acquired, so the peaks and troughs here indicate that this business is strongly seasonal. The higher peaks towards the low end also show that this company has greatly increased its acquisition levels in recent years.

ax = sns.distplot(df_rfmt['T'])


Fit the BG/NBD model

Lifetimes comes with a number of different models you can fit to your data. The first one we’ll try is the Beta Geometric Negative Binomial Distribution or BG/NBD model. This model is based on the original Pareto/NBD model for CLV which was formulated by Schmittlein, Morrison, and Colombo in 1987.

This version, which was written by Fader, Hardy, and Lee in 2005, is designed to be just as effective as Pareto/NBD but much easier to implement. It’s become one of the most widely used CLV models around. We can fit the model using the frequency, recency, and T data and then view a summary.

bgf = BetaGeoFitter(penalizer_coef=0)['frequency'], df_rfmt['recency'], df_rfmt['T'])
coef se(coef) lower 95% bound upper 95% bound
r 0.106286 0.000921 0.104480 0.108092
alpha 30.786602 0.569320 29.670735 31.902468
a 0.515978 0.013150 0.490205 0.541751
b 0.856091 0.026806 0.803550 0.908631

Plot the recency and frequency of customers

Now the BG/NBD model has been fitted to the data, we can create predictions and visualise them. The plot_frequency_recency_matrix() function will do this for you. The matrix has the customer’s recency on the Y axis and the frequency on the X axis and the heatmap component is showing the predicted number of future purchases customers at intersecting points will make in one unit of time. The customers most likely to order are those who’ve placed lots of previous orders and have been seen relatively recently.

figure(num=None, figsize=(10, 10), dpi=80, facecolor='w', edgecolor='k')

<matplotlib.axes._subplots.AxesSubplot at 0x7f10b200d0d0>


Plot the probability that each customer is alive

The next prediction we can visualise is the probability that each customer is “alive” based on their frequency and recency. We can do the whole thing by running plot_probability_alive_matrix(bgf) on the model we fitted earlier.

figure(num=None, figsize=(10, 10), dpi=80, facecolor='w', edgecolor='k')

<matplotlib.axes._subplots.AxesSubplot at 0x7f10b1fef820>


Predict the number of orders each customer will make during a time period

Personally, I find the raw data much more interesting and useful than the visualisations, so let’s use the BG/NBD model data to predict the number of purchases each customer will make in some forthcoming periods.

We can do this by using the conditional_expected_number_of_purchases_up_to_time() and passing in the number of days to predict in the future t, the frequency data, the recency data and the T data showing tenure. We’ll set t to 90 so we predicted for the 90 days after the end of the observation period.

t = 90
df_rfmt['predicted_purchases'] = bgf.conditional_expected_number_of_purchases_up_to_time(t, 

If you print the tail() of the dataframe to which we assigned the predicted_purchases value from the model, you’ll see the customers ranked by the predicted number of orders they’re likely to place in the specified period. We’ve got a number of customers in this dataset who should place quite a few orders in the coming period. At the other end of the dataframe we’ll have a load who won’t.

frequency recency T monetary_value predicted_purchases
419431 13.0 131.0 131.0 100.810769 6.226748
379598 19.0 204.0 204.0 184.559474 6.541054
436281 12.0 82.0 91.0 28.352500 6.936018
371248 33.0 273.0 276.0 528.616364 8.880676
5349 170.0 1266.0 1267.0 264.371706 11.552999
370212 56.0 287.0 290.0 280.891429 14.516430
0 165.0 777.0 779.0 0.000000 17.768561
350047 115.0 394.0 397.0 652.167130 22.798098
350046 125.0 394.0 397.0 300.740800 24.781726
311109 250.0 530.0 533.0 3948.793840 38.104691

Plotting the predicted_purchases value shows we have a really long tail, but that most customers in the dataset aren’t going to order in the next 90 days. Obviously, this is based only on the customers in the dataset, so this prediction wouldn’t include any customers acquired since then or during the 90 day period, so will be higher in reality.

ax = sns.distplot(df_rfmt['predicted_purchases'])


Compare the model’s predictions to the actual data

Next we can assess how well the BetaGeoFitter BG/NBD model fits to our data. The plot_period_transactions() function will compare the actual observations from a previous time period to the model’s prediction for that time period. As you can see, the two are almost identical, which shows that the model is a very good fit and predicts the number of periods in the calibration period rather well.

<matplotlib.axes._subplots.AxesSubplot at 0x7f10b226a1f0>


Using a holdout group

The other, more robust, approach to making predictions is to use a calibration period and a holdout or observation period. The calibration period starts from the date of the first transaction in your dataset and ends here on 2020-06-01. Immediately after this period the observation period starts, and this ends on 2020-10-01. We therefore have several years of data in the calibration period and five months in the observation period. If you run the calibration_and_holdout_data() it will partition the data like this for you.

summary_cal_holdout = calibration_and_holdout_data(df_orders, 

As you’ll see below, we’ve now got a partitioned dataset. For each customer we’ve got _cal suffixed frequency, recency, and tenure metrics which contain data from the calibration period, plus a frequency_holdout value containing the actual number of orders they placed in the holdout period.

summary_cal_holdout.sort_values(by='frequency_holdout', ascending=False).head()
frequency_cal recency_cal T_cal frequency_holdout duration_holdout
0 108.0 654.0 657.0 57.0 122.0
5349 150.0 1134.0 1145.0 20.0 122.0
311109 232.0 404.0 411.0 18.0 122.0
370212 38.0 161.0 168.0 18.0 122.0
350047 97.0 268.0 275.0 18.0 122.0

Re-fit the model on the calibration period

Now we’ve got a dataset that’s partitioned into a calibration and validation or holdout period, we can re-fit the model just to the calibration data columns, frequency_cal, recency_cal and T_cal. The model now hasn’t seen any of the data for the holdout period.['frequency_cal'], 
<lifetimes.BetaGeoFitter: fitted with 145727 subjects, a: 0.48, alpha: 33.72, b: 0.83, r: 0.11>

Now that’s refitted, we can run plot_calibration_purchases_vs_holdout_purchases() to plot the predicted frequency of orders in the holdout period to the actual value. As you can see, the lines are pretty close together. The model is not far off at predicting the number of orders each customer will make.

plot_calibration_purchases_vs_holdout_purchases(bgf, summary_cal_holdout)
<matplotlib.axes._subplots.AxesSubplot at 0x7f10b218f310>


Using a longer holdout period

To get a feel of how good the model is, you can also try adjusting the length of the observation period or holdout group and refitting the model to the new data. Here, rather than using a five month observation period, I’ve set it to one year to see how well the model copes with a longer period.

from lifetimes.utils import calibration_and_holdout_data
from lifetimes.plotting import plot_calibration_purchases_vs_holdout_purchases

summary_cal_holdout = calibration_and_holdout_data(df_orders, 
<lifetimes.BetaGeoFitter: fitted with 91360 subjects, a: 0.49, alpha: 37.84, b: 0.83, r: 0.12>

Re-running plot_calibration_purchases_vs_holdout_purchases() shows that we get a better result with a longer period (perhaps unsurprisingly). The BG/NBD model is really good at predicting the number of orders customers will make over longer periods, but obviously this is tougher for shorter durations.

plot_calibration_purchases_vs_holdout_purchases(bgf, summary_cal_holdout)
<matplotlib.axes._subplots.AxesSubplot at 0x7f10b20269d0>


Get predictions for a specific customer

If you want to generate predictions for specific customers you can simply pass in their index using iloc and extract their recency, frequency and T values from the calculated data and give them to the model to obtain a prediction. The customer in index 1 should place 2.47 orders in the next 365 days.

t = 365
individual = df_rfmt.iloc[1]

Plot customer probability history

The other neat thing you can do is examine the probability of each customer being alive over time. One of the main concepts of CLV models, like Buy Till You Die (BTYD) is that active customers will purchase “randomly” around their mean transaction rate. If a customer shops every 30 days on average, their orders might come every 20-40 days, for example.

This will vary across customers - some will binge, some will take a break from your business, but most will shop somewhere around the mean while they’re alive. That’s why calculating customer purchase latency is so useful. In the Dropout Process, each customer has an unobserved propensity to dropout or “die”. We don’t know what this figure will be and, like transaction rates, dropout propensities vary across customers.

If we look at customer 436281 from our original dataset, we can see that they’ve placed 15 orders since they were first acquired on 2020-07-22 about 118 days ago. Most of their orders are pretty small, and they’ve had a refund for something, but they come back regularly.

example_customer_orders = df_orders.loc[df_orders['customer_id'] == 436281]
order_id customer_id total_revenue date_created
240206 554824 436281 16.94 2020-07-02 22:03:40
242731 557349 436281 43.82 2020-07-09 14:30:59
243396 558014 436281 26.10 2020-07-11 14:13:06
246420 561038 436281 25.05 2020-07-19 13:12:53
248893 563511 436281 20.57 2020-07-23 18:41:39
248992 563610 436281 40.25 2020-07-24 01:16:16
255057 569675 436281 12.62 2020-08-06 15:46:45
259436 574054 436281 29.84 2020-08-16 10:40:33
262144 576762 436281 16.85 2020-08-22 12:55:15
262658 577276 436281 42.83 2020-08-23 22:09:23
265330 579948 436281 49.02 2020-08-30 11:30:59
270570 585188 436281 16.45 2020-09-12 18:53:28
274453 589071 436281 16.83 2020-09-22 07:37:58
280699 595317 436281 43.52 2020-10-11 16:14:29
283196 597814 436281 0.00 2020-10-19 16:15:39

By visualising the data on the customer above using plot_history_alive() we can see two things. Firstly, we can see the typical purchase latency. Their orders do follow the expected trend and they purchase around the mean of their latency. Secondly, we can see the decay in their probability of being alive that occurs when they pause between orders. The longer the gap gets above their mean, the more likely they are to churn or die.

figure(num=None, figsize=(14, 6), dpi=80, facecolor='w', edgecolor='k')

days_since_birth = 118
plot_history_alive(bgf, days_since_birth, example_customer_orders, 'date_created')
<matplotlib.axes._subplots.AxesSubplot at 0x7f10b2197250>


If we now change the days_since_birth value from the actual length of their tenure (118 days) to a longer period, we’ll see how their probability of being alive plummets. If they don’t order again before the end of the year, the chances of them being alive are very slim.

Either that, or they’re a seasonal customer and they might come back again next year. As you can see, we can predict the likelihood to order and visualise it any point in time with decent accuracy.

figure(num=None, figsize=(14, 6), dpi=80, facecolor='w', edgecolor='k')

days_since_birth = 182
plot_history_alive(bgf, days_since_birth, example_customer_orders, 'date_created')
<matplotlib.axes._subplots.AxesSubplot at 0x7f10b20a1c10>


Predicting customer lifetime value

Now that we’ve got the prediction of orders in the bag, let’s take a look at predicting their value. To do that we’ll use the Gamma-Gamma model. This was written by Peter Fader and Bruce Hardie in 2013 and was based on one of Fader’s earlier models for predicting spend.

It’s a bit like BG/NBD in some ways. It assumes that the monetary value of a customer’s transactions will vary randomly around their average order value, and that average order value varies across customers but doesn’t usually vary much within customers and is independent of the transaction process.

We can use the original df_rfmt dataframe we created earlier for our training data. We’ll filter this to include only the returning customers.

returning_customers = df_rfmt[df_rfmt['frequency'] > 0]
returning_customers = df_rfmt[df_rfmt['monetary_value'] > 0]
returning_customers.sort_values(by='monetary_value', ascending=False).head()
frequency recency T monetary_value predicted_purchases
394120 1.0 13.0 170.0 6099.85000 0.106700
311109 250.0 530.0 533.0 3948.79384 38.104691
378520 1.0 19.0 210.0 3120.00000 0.086171
349863 1.0 277.0 398.0 2849.00000 0.118736
390162 1.0 1.0 176.0 2847.84000 0.076285

One really important point about the Gamma-Gamma model is that it assumes that monetary value and frequency are independent variables and aren’t correlated with each other. For this reason, the monetary value you use here is based on the mean value of their orders rather than the total revenue, which obviously would be correlated. You can check that they’re not correlated using the corr() function which runs a Pearson correlation. The 0.1 scores we get mean we’re good to go.

returning_customers[['monetary_value', 'frequency']].corr()
monetary_value frequency
monetary_value 1.000000 0.109151
frequency 0.109151 1.000000

Fit the Gamma-Gamma model

Next we’ll fit the Gamma-Gamma model to the returning_customers dataframe we created above. We only need the frequency and monetary_value columns in order to fit the model, as we’re aiming to predict the monetary value of the orders they’ll place.

from lifetimes import GammaGammaFitter

ggf = GammaGammaFitter(penalizer_coef = 0)['frequency'],
<lifetimes.GammaGammaFitter: fitted with 36599 subjects, p: 2.64, q: 3.03, v: 44.66>

Predict average order value

If you run the conditional_expected_average_profit() function on the fitted ggf model you get back the predicted average order value for each customer. If you have customer-level profit data, you can also fit the model using this data instead of their revenue, providing it’s the mean and is consistently calculated.

predicted_monetary = ggf.conditional_expected_average_profit(

Now we’ll take the Series returned by the above function and assign it to a dataframe, placing the predicted AOV value in its own column and leaving the customer_id value in the index. If you sort the aov column in descending order and reassign the sorted data back to the dataframe, you can then look at the top and bottom of your customer list.

aov_df = pd.DataFrame(predicted_monetary,columns = ['aov'])
aov_df = aov_df.sort_values(by='aov', ascending=False)
311109 3936.861923
394120 3473.341229
291245 2131.907063
329785 1830.817183
378520 1788.906091
136467 12.175346
333479 12.123424
348326 11.854745
237556 10.680131
247246 10.116339

Here’s a plot of their AOVs, which follows the pattern you’d expect to see on a long-tailed dataset like this one. Most of our customers are going to spend only a modest amount, but there’s a long-tail of higher value customers and a little group of really big spenders on the far right.

ax = sns.distplot(aov_df['aov'])


Segment customers by their AOV

To get an idea of the spread of AOV across customers we can use quantile-based discretization or binning via qcut(). We can take the dataframe above and create five bins labeled 1 to 5, reset the index, and then create an aggregation to calculate the summary statistics based on the aov_bin.

aov_df['aov_bin'] = pd.qcut(aov_df['aov'], q=5, labels=[1, 2, 3, 4, 5])
aov_df = aov_df.reset_index()

Bin 1 contains the low CLV customers who have an average value of £29.16, the averages go up in each bin, rising to an average of £123.68 in bin 5. However, these include customers from £69.98 to £3936.86, so there’s quite a spread at the top end.

    count=('customer_id', 'count'),
    min_aov=('aov', min),
    max_aov=('aov', max),
    std_aov=('aov', 'std'),    
    aov=('aov', 'mean')
count min_aov max_aov std_aov aov
1 7320 10.116339 33.825442 3.977996 29.161352
2 7320 33.828518 41.293594 2.161160 37.302817
3 7321 41.294249 51.702518 2.992446 45.943007
4 7318 51.705722 69.978635 5.060488 59.194867
5 7320 69.981304 3936.861923 117.159528 123.684906

Applying K-means clustering

Using K-means clustering is another way to get a picture of the data. While the cluster numbers aren’t sorted in order, you can clearly see the differences in AOV across the five clusters. Cluster zero contains low spenders, averaging ~£44 per order, while clusters 1 and 3 include some very big spenders.

from sklearn.cluster import KMeans

aov_clusters = aov_df[['aov']]
kmeans = KMeans(n_clusters=5)

aov_clusters = aov_clusters.assign(cluster=kmeans.labels_)
3 3705.101576
1 1013.678602
2 313.632772
4 110.175450
0 44.454886

Predicting Customer Lifetime Value

Finally, we’ll get to the good bit - predicting Customer Lifetime Value. To do this we’ll use both models - the BG/NBD model to predict the number of orders and the Gamma-Gamma model to predict their values. First we’ll re-fit the BetaGeoFitter BG/NBD model to our dataset for the returning customers, which includes the monetary data.['frequency'], 
<lifetimes.BetaGeoFitter: fitted with 36599 subjects, a: 0.54, alpha: 164.34, b: 0.90, r: 1.65>

Next, we’ll use the Gamma-Gamma model’s customer_lifetime_value() function to predict their value. We’ll pass in the bgf model, along with the recency, frequency, monetary_value, and T data for each customers. The time parameter is in months and defines how many months into the future we wish to predict.

If you’ve not worked with CLV before you’re likely to be confused by the discount_rate argument. In CLV, discount has nothing to do with giving customers money off their orders. Instead, it relates to discounted cash flow (DCF), which is a financial measure for adjusting the cost of capital. This is a monthly adjusted discount rate and uses a default value of 0.01 in this package. You’ll likely need some help in calculating this if you want to set it precisely.

preds = ggf.customer_lifetime_value(
    time = 12,
    discount_rate = 0.01

preds = preds.to_frame().reset_index()

If you sort the dataframe in descending order of the predicted CLV, you’ll see who the model predicts your best customers to be over the next 12 months. There’s one massive one - perhaps a trade customer - plus a number of others who are really lucrative for the business.

preds.sort_values(by='clv', ascending=False).head()
customer_id clv
21119 311109 424690.737263
25136 350047 39223.600372
25135 350046 19667.144144
26994 371248 11335.564415
152 5349 10000.762875

At the bottom of the dataframe we’ve got all the customers who are predicted to spend next to nothing. These figures are likely to be so low they won’t even register as whole orders, of course.

preds.sort_values(by='clv', ascending=False).tail()
customer_id clv
13581 223154 1.167415e-03
6248 182159 1.055954e-03
4377 152909 1.174642e-04
9973 207386 7.647824e-06
14898 230097 1.615773e-18

Further reading

  • Fader, P.S., Hardie, B.G. and Lee, K.L., 2005. “Counting your customers” the easy way: An alternative to the Pareto/NBD model. Marketing science, 24(2), pp.275-284.

  • Fader, P.S. and Hardie, B.G., 2013. The Gamma-Gamma model of monetary value.

  • Schmittlein, D.C., Morrison, D.G. and Colombo, R., 1987. Counting your customers: Who-are they and what will they do next?. Management science, 33(1), pp.1-24.

Matt Clarke, Sunday, March 14, 2021

Matt Clarke Matt is a Digital Director who uses data science to help in his work. He has a Master's degree in Internet Retailing (plus two other Master's degrees in different fields) and specialises in the technical side of ecommerce and marketing.

Extreme Gradient Boosting with XGBoost

Learn the fundamentals of gradient boosting and build state-of-the-art machine learning models using XGBoost to solve classification and regression problems.

Start course for FREE