One common conundrum in e-commerce and marketing involves trying to ascertain whether a given change in marketing activity, product price, or site design or content, has had a statistically significant improvement in performance.
Since marketing time series data often have an upward trend, or various types of underlying seasonality, a simple before-and-after style comparison won’t really give the most accurate results, even though it’s what most people do. Even more sophisticated Marketing Mix Models based on machine learning can be unreliable at times.
Data scientists at Google came up with an innovative solution to this problem back in 2015 by developing a causal impact model (Brodersen et al., 2015). This basically takes a time series dataset including a treatment period and a post-treatment period and aims to identify whether it had a statistically significant impact on a chosen metric.
Unlike other systems, Causal Impact does this by predicting what would have happened based on a forecast made from the pre-treatment period, and then compares what actually happened with what was forecast.
For example, let’s say you used deep learning to generate new product descriptions for all the pages on your e-commerce site and you wanted to observe whether this had worked or not. The work was delivered to the site on a single date and the traffic was trending upwards anyway, so a simple before-and-after test would show an improvement even if you’d done nothing.
By running this scenario through a causal impact model, you could make a prediction on, say, the number of organic search impressions generated, and make a prediction on what you’d expect to see in the next period if nothing was changed. Then you’d be able to assess whether the number of impressions fell significantly outside the forecast range, thus giving you a good indicator of whether the changes worked or not.
As you might imagine, there are loads of potential applications for this in e-commerce and marketing:
When Brodersen and colleagues published their research in 2015, they also released an R package called CausalImpact designed to allow marketers to run their algorithm and predict causal impact from time series data. In recent years, this has been ported to Python. In this project, I’ll show you to use it.
The Causal Impact model implementation we’re using is called PyCausalImpact, which is written by Will Fuks. You can install this via PyPi by entering the command pip3 install pycausalimpact
in your terminal, or executing the code below in a Jupyter notebook.
Once installed, import the Pandas package and the CausalImpact
module from causalimpact
. This uses a number of other Python packages, including statsmodels
. That currently throws a few ugly warnings in places, so I’d recommend adding the additional block of code below to hide these warnings and keep your notebook clutter-free.
!pip3 install pycausalimpact
import pandas as pd
from causalimpact import CausalImpact
import sys
import warnings
if not sys.warnoptions:
warnings.simplefilter("ignore")
Next, we’ll load a time series dataset to examine using the Causal Impact model. I’ve created an example dataset from some Google Search Console data where a change was made after July 17th 2021. First, use the Pandas read_csv()
function to load the data and view the dataframe, then use info()
to show the structure and data types present.
df = pd.read_csv('https://raw.githubusercontent.com/flyandlure/datasets/master/causal_impact_dataset.csv')
df.head()
date | clicks | impressions | ctr | position | |
---|---|---|---|---|---|
0 | 2021-07-04 | 136 | 6301 | 2.16 | 31.97 |
1 | 2021-07-05 | 264 | 8697 | 3.04 | 27.66 |
2 | 2021-07-06 | 299 | 9236 | 3.24 | 26.38 |
3 | 2021-07-07 | 276 | 10008 | 2.76 | 26.77 |
4 | 2021-07-08 | 283 | 9725 | 2.91 | 25.83 |
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 28 entries, 0 to 27
Data columns (total 5 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 date 28 non-null object
1 clicks 28 non-null int64
2 impressions 28 non-null int64
3 ctr 28 non-null float64
4 position 28 non-null float64
dtypes: float64(2), int64(2), object(1)
memory usage: 1.2+ KB
PyCausalImpact requires the dataframe to be in a specific format in order to work, so we need to make some minor changes to this before we pass it to the model. Firstly, we need to change the date
column from the current object
data type to a datetime, which we can do using the to_datetime()
function.
Then we need to convert the regular dataframe to a date indexed dataframe by passing the date
to set_index()
. That gives us a dataframe containing our clicks
, impressions
, ctr
, and position
for each date
, but the date
is assigned to the index rather than its own column.
df['date'] = pd.to_datetime(df['date'])
df = df.set_index('date')
df.head()
clicks | impressions | ctr | position | |
---|---|---|---|---|
date | ||||
2021-07-04 | 136 | 6301 | 2.16 | 31.97 |
2021-07-05 | 264 | 8697 | 3.04 | 27.66 |
2021-07-06 | 299 | 9236 | 3.24 | 26.38 |
2021-07-07 | 276 | 10008 | 2.76 | 26.77 |
2021-07-08 | 283 | 9725 | 2.91 | 25.83 |
df.info()
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 28 entries, 2021-07-04 to 2021-07-31
Data columns (total 4 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 clicks 28 non-null int64
1 impressions 28 non-null int64
2 ctr 28 non-null float64
3 position 28 non-null float64
dtypes: float64(2), int64(2)
memory usage: 1.1 KB
Now we’ve got our data prepared, we need to define the pre_period
and post_period
dates, which need to be provided in list
format. The pre_period
list will contain two dates: the start date is the earliest date in the dataframe (2021-07-04), while the end date is the day before the site change was made. The post_period
contains the date reflecting the day the change was made, plus the end date for this period.
pre_period = ['2021-07-04', '2021-07-17']
post_period = ['2021-07-18', '2021-07-31']
All we need to do now is pass in the dataframe column containing the metric we want to examine, i.e. clicks
and the pre_period
and post_period
lists containing the four dates. There are various other settings you can configure with CausalImpact
, but we’ll just create a simple base model to keep things simple. As this dataset is very small, the model should fit in seconds, with the function returning an object called model
that you can manipulate in a number of different ways.
model = CausalImpact(df['clicks'], pre_period, post_period)
First, I would recommend that you print the output of the summary(output='report')
function. This returns a verbose analysis of the model results, which explains exactly what was found and how you should interpret it. For our dataset, the model predicted that in the absence of intervention we should expect to see an average of 243 clicks in the post-intervention period, but we actually generated 344.
By subtracting the actual number of clicks in the post-intervention period to what was predicted by the model in the absence of intervention, you get to see the causal effect of that the intervention probably had upon your response variable - the clicks. The site changes made increased clicks by nearly 42%, which was statistically significant and is unlikely to be random (but may, of course, have been caused by something else).
print(model.summary(output='report'))
Analysis report {CausalImpact}
During the post-intervention period, the response variable had
an average value of approx. 344.36. By contrast, in the absence of an
intervention, we would have expected an average response of 243.01.
The 95% interval of this counterfactual prediction is [183.22, 298.1].
Subtracting this prediction from the observed response yields
an estimate of the causal effect the intervention had on the
response variable. This effect is 101.35 with a 95% interval of
[46.26, 161.13]. For a discussion of the significance of this effect,
see below.
Summing up the individual data points during the post-intervention
period (which can only sometimes be meaningfully interpreted), the
response variable had an overall value of 4821.0.
By contrast, had the intervention not taken place, we would have expected
a sum of 3402.13. The 95% interval of this prediction is [2565.15, 4173.42].
The above results are given in terms of absolute numbers. In relative
terms, the response variable showed an increase of +41.71%. The 95%
interval of this percentage is [19.03%, 66.31%].
This means that the positive effect observed during the intervention
period is statistically significant and unlikely to be due to random
fluctuations. It should be noted, however, that the question of whether
this increase also bears substantive significance can only be answered
by comparing the absolute effect (101.35) to the original goal
of the underlying intervention.
The probability of obtaining this effect by chance is very small
(Bayesian one-sided tail-area probability p = 0.0).
This means the causal effect can be considered statistically
significant.
To see the actual statistics from the Causal Impact model, you can print the output of model.summary()
. This shows you various data points mentioned in the verbose summary above, including the absolute and relative effects, the prediction and the actual.
print(model.summary())
Posterior Inference {Causal Impact}
Average Cumulative
Actual 344.36 4821.0
Prediction (s.d.) 243.01 (29.31) 3402.13 (410.28)
95% CI [183.22, 298.1] [2565.15, 4173.42]
Absolute effect (s.d.) 101.35 (29.31) 1418.87 (410.28)
95% CI [46.26, 161.13] [647.58, 2255.85]
Relative effect (s.d.) 41.71% (12.06%) 41.71% (12.06%)
95% CI [19.03%, 66.31%] [19.03%, 66.31%]
Posterior tail-area probability p: 0.0
Posterior prob. of a causal effect: 100.0%
For more details run the command: print(impact.summary('report'))
If you want to observe the time series data in a plot, you can run model.plot()
. This will show the whole time series and place a line on the chart indicating the start of the post-intervention period at which the site change was made. It will plot the predicted response variable - y
- as well as the actual value recorded, and a confidence interval. It also shows the cumulative effect, which in our case is positive, as the site change seemed to work rather well.
model.plot()
Finally, if you want to see the full data behind the plots, you can output the inferences
dataframe. This gives you all the raw data, and the various predictions and actual data, should you wish to analyse them separately.
model.inferences.head()
post_cum_y | preds | post_preds | post_preds_lower | post_preds_upper | preds_lower | preds_upper | post_cum_pred | post_cum_pred_lower | post_cum_pred_upper | point_effects | point_effects_lower | point_effects_upper | post_cum_effects | post_cum_effects_lower | post_cum_effects_upper | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2021-07-04 | NaN | 243.000000 | NaN | NaN | NaN | -144088.925913 | 144574.925913 | NaN | NaN | NaN | -107.000000 | -144438.925913 | 144224.925913 | NaN | NaN | NaN |
2021-07-05 | NaN | 136.000115 | NaN | NaN | NaN | -75.819322 | 347.819552 | NaN | NaN | NaN | 127.999885 | -83.819552 | 339.819322 | NaN | NaN | NaN |
2021-07-06 | NaN | 200.002087 | NaN | NaN | NaN | 16.559069 | 383.445104 | NaN | NaN | NaN | 98.997913 | -84.445104 | 282.440931 | NaN | NaN | NaN |
2021-07-07 | NaN | 233.004926 | NaN | NaN | NaN | 60.051324 | 405.958528 | NaN | NaN | NaN | 42.995074 | -129.958528 | 215.948676 | NaN | NaN | NaN |
2021-07-08 | NaN | 243.756117 | NaN | NaN | NaN | 76.292710 | 411.219525 | NaN | NaN | NaN | 39.243883 | -128.219525 | 206.707290 | NaN | NaN | NaN |
I’ve been using Causal Impact in my work for quite some time, particularly to assess the performance of on-site changes. I originally did what most digital marketers do, and used the Google Search Console API to compare the data between two periods.
However, I’ve now wrapped up my SEO testing code into a single function that allows me to fetch Google Search Console data via the API and run an SEO test via Causal Impact in just three lines of Python code. You can use this via the SEO module in my EcommerceTools Python package.
Matt Clarke, Thursday, August 12, 2021