How to use the BG/NBD model to predict customer purchases

The Beta-Geometric Negative Binomial Distribution or BG/NBD model lets you predict which customers will order in the next period. Here's how it works.

How to use the BG/NBD model to predict customer purchases
Picture by Clay Banks, Unsplash.
14 minutes to read

You might think human behaviour would be hard to predict but, in ecommerce data science, it’s not actually as difficult as you may think to predict whether a customer will purchase in the next period or not.

Once they’ve placed a few orders, customers behave in quite predictable ways and the science behind it is really quite logical. All you really need to make these predictions are some transactional data, comprising one order per line, and the ability to write some Python code to manipulate the data and put it through a model called the Beta Geometric Negative Binomial Distribution or BG/NBD.

The BG/NBD model is an improvement on the earlier Pareto/NBD model and uses the same “Buy ‘Til You Die” approach, which allow you to calculate the probability of a customer being “alive” (or still a customer) at a given point in the future.

Here we’ll use a transactional dataset, and use Lifetimes to calculate some RFM metrics and then predict the probability that each customer is alive and estimate the number of orders each one will place in the next period.

Load data

First, we’ll load our transactional data. This is standard stuff in ecommerce and comprises the unique order ID, the customer ID, the total value of the order and the date on which it was placed. In our dataset we also have the channel and country fields, but we don’t need those. We’ve also got a redundant column called “Unnamed: 0”, so we’ll drop this to tidy things up.

import pandas as pd
import warnings
df_orders = pd.read_csv('data/orders.csv')
df_orders.drop(['Unnamed: 0'], axis=1, inplace=True)
order_id customer_id channel country total_revenue order_date
0 299527 166958 1 231 74.01 2017-04-07 04:55:58
1 299528 191708 1 231 44.62 2017-04-07 06:34:07
2 299529 199961 5 231 16.99 2017-04-07 07:18:50
3 299530 199962 5 231 11.99 2017-04-07 07:20:25
4 299531 199963 5 231 14.49 2017-04-07 07:21:40

Create frequency, recency, T

In ecommerce data science almost everything revolves around four or five key metrics, which are really all derivatives of recency (R), frequency (F) and monetary value (M). These form the basis of the popular RFM model which has been used in marketing for decades. The other two metrics that matter in ecommerce are tenure (T) - how long the customer has been a customer - and latency - the number of days between their orders. It’s pretty easy to calculate these manually, however, we’re going to use Cameron Davidson-Pilon’s superb Lifetimes package as it does this easily and gives you access to some models to analyse the data.

First, you will need to install Lifetimes by entering pip3 install lifetimes and then load the summary_data_from_transaction_data module from lifetimes.utils. By passing this method your Pandas dataframe of transactions and defining the columns that contain the customer ID and the order date, Lifetimes helper will calculate the frequency, recency and Tenure (or age) for you.

from lifetimes.utils import summary_data_from_transaction_data

data = summary_data_from_transaction_data(df_orders, 

If you print the head() of the data dataframe returned by summary_data_from_transaction_data() you’ll see that it’s identified each unique customer and has calculated their Recency, Frequency and T. There are lots of different ways to calculate similar metrics, so it’s worth getting to grips with what Lifetimes does.

Recency represents the age of the customer in days when they made their most recent purchase and is calculated from their tenure minus the number of days since their last order. A recency of zero indicates a newly acquired customer. Frequency measures the number of repeat orders a customer has placed, so a value of zero indicates a new customer who has placed a single order, a value of 1 indicates a customer placing their second order, and so on. T measures the tenure of the customer in days - that is, how many days have elapsed since their first order.

frequency recency T
6 5.0 990.0 1123.0
34 0.0 0.0 87.0
44 12.0 1055.0 1227.0
45 0.0 0.0 514.0
71 6.0 1128.0 1240.0

Fit the Beta-Geometric Negative Binomial Distribution model

Now that we have our basic customer data set up, we can fit a model. Lifetimes includes several models. First, we will use the BetaGeoFitter model, which provides the Beta-Geometric Negative Binomial Distribution model that is common to the so-called “Buy ‘Til You Die” customer lifetime models. To fit the model, we simply pass in the dataframe columns containing the frequency, recency, and tenure data.

from lifetimes import BetaGeoFitter

bgf = BetaGeoFitter(penalizer_coef=0.0)['frequency'], data['recency'], data['T'])
coef se(coef) lower 95% bound upper 95% bound
r 0.108855 0.000988 0.106919 0.110792
alpha 34.224816 0.647906 32.954920 35.494713
a 0.503406 0.013369 0.477203 0.529609
b 0.834213 0.027241 0.780821 0.887606

Visualise the Recency/Frequency matrix

To examine the output of the model, we can pass the bgf data to plot_frequency_recency_matrix(). A recency/frequency matrix can show you the probability that a customer is still a customer or is “alive” based on their intra-activity latency, or the gap between orders. If a customer usually orders every week and hasn’t been seen for a few months, their probability of being alive is low. However, if a customer orders every few months and bought a couple of months ago, then they’re probably still alive.

A typical recency/frequency matrix shows a long tail at the bottom of the matrix. On the example below, a customer who has a frequency of 200+ and has been a customer for 1200+ days when they placed their last order is likely to be alive.

from lifetimes.plotting import plot_frequency_recency_matrix

<matplotlib.axes._subplots.AxesSubplot at 0x7f312d881b20>


Plot the probability of customers being alive

To identify the probability of whether customers are alive, you can use the plot_probability_alive_matrix() method and pass it the bgf data. Here, we can see that customers who have ordered very recently are likely to be alive, and this goes up with the number of orders placed.

from lifetimes.plotting import plot_probability_alive_matrix

<matplotlib.axes._subplots.AxesSubplot at 0x7f312d7eb820>


Predict which customers will order in the next period

The other powerful thing you can do with the BG/NBD model is predict the number of purchases each customer is likely to make over the next period. Here, we set t to be 30 so the model predicts the number of purchases each customer will make in the next 30 days, then we output the predictions and sort the results.

t = 30
data['predicted_purchases'] = bgf.conditional_expected_number_of_purchases_up_to_time(t, 
frequency recency T predicted_purchases
236204 56.0 786.0 790.0 2.000242
379598 16.0 178.0 180.0 2.104599
437951 8.0 52.0 59.0 2.168156
306933 12.0 93.0 101.0 2.344462
371248 29.0 246.0 252.0 2.881791
5349 164.0 1239.0 1243.0 3.812563
370212 52.0 260.0 266.0 4.945195
350047 111.0 367.0 373.0 7.853591
350046 122.0 367.0 373.0 8.616947
311109 246.0 503.0 509.0 12.998316

To determine the fit of our BG/NBD model and the probable accuracy of its predictions we can plot the data to assess it. As with the other Lifetimes modules, we just need to pass the bgf output to the function with plot_period_transactions() and it will return a MatPlotLib chart showing the predictions against the actual. If the Actual and Model bars are similar, our model is pretty good at making the prediction.

from lifetimes.plotting import plot_period_transactions

<matplotlib.axes._subplots.AxesSubplot at 0x7f312d755af0>


Testing the model’s predictions with a holdout group

To properly test the BG/NBD model it’s best to create a partitioned dataset. Here we will create a calibration period in which to train the model and then create a holdout period to validate our model. The model never gets to see the data in the holdout group, but we can compare the accuracy of the prediction with the known number of purchases after the model has run.

The calibration_and_holdout_data() function creates that partitioned dataset for you. It’s much like the one we created above, but includes labels relating to the frequency, recency, and tenure during calibration, plus the duration of the holdout period in days and the predicted number of purchases we expect to see within the holdout period.

from lifetimes.utils import calibration_and_holdout_data

summary_cal_holdout = calibration_and_holdout_data(df_orders, 
frequency_cal recency_cal T_cal frequency_holdout duration_holdout
6 5.0 990.0 1025.0 0.0 91.0
44 12.0 1055.0 1129.0 0.0 91.0
45 0.0 0.0 416.0 0.0 91.0
71 6.0 1128.0 1142.0 0.0 91.0
242 0.0 0.0 1128.0 0.0 91.0

By plotting the actual number of purchases in the holdout period against the model’s predictions we can eyeball the model’s accuracy. For customers who placed three purchases in the calibration period, we’d expect to see about 0.3 in the holdout period.

from lifetimes.plotting import plot_calibration_purchases_vs_holdout_purchases['frequency_cal'], 

plot_calibration_purchases_vs_holdout_purchases(bgf, summary_cal_holdout)
<matplotlib.axes._subplots.AxesSubplot at 0x7f312c8f35e0>


Setting a longer period for the holdout might be required to give you more useful predictions, but it depends on the dataset and the typical frequency with which your customers shop. If we re-run the model with the holdout period set to a year, instead of a few months, we can see that the predictions are pretty close to the actual. You can, of course, output the data itself in Pandas and re-join it to your original data to double-check it on an individual customer level.

from lifetimes.utils import calibration_and_holdout_data
from lifetimes.plotting import plot_calibration_purchases_vs_holdout_purchases

summary_cal_holdout = calibration_and_holdout_data(df_orders, 

plot_calibration_purchases_vs_holdout_purchases(bgf, summary_cal_holdout)
<matplotlib.axes._subplots.AxesSubplot at 0x7f3126ae8700>


Matt Clarke, Wednesday, March 03, 2021

Matt Clarke Matt is a Digital Director who uses data science to help in his work. He has a Master's degree in Internet Retailing (plus two other Master's degrees in different fields) and specialises in the technical side of ecommerce and marketing.

Extreme Gradient Boosting with XGBoost

Learn the fundamentals of gradient boosting and build state-of-the-art machine learning models using XGBoost to solve classification and regression problems.

Start course for FREE