In ecommerce, email marketing remains one of the most effective (and cost-effective) digital marketing techniques, especially when combined with data science techniques. The vast amounts of customer data generated in ecommerce businesses gives you direct access to some extremely powerful features that can be used to provide more timely, relevant, and interesting content to email subscribers based on what you know about them, or what your models can predict about them.
To get the best results from email marketing, you need to undertake regular analyses of email marketing campaign performance to better understand what works and what doesn’t. The popular Mailchimp Email Service Provider (ESP) platform is one of the most commonly used in ecommerce, especially in small to medium sized companies, and it gives you easy access to these data through a user-friendly user interface.
However, to really get a strong understanding of your campaign performance, and the behaviour of your individual subscribers and the ways in which they’ve interacted with specific campaigns, it’s useful to export the data from Mailchimp and analyse it in Python using Pandas. Thankfully, Mailchimp has a superb Marketing API to allow you to access the data in your account. Here’s a quick guide to using it.
To access the Mailchimp API you’ll need to install the mailchimp-marketing
Python package from PyPi by entering pip3 install mailchimp-marketing
in your terminal, or !pip3 install mailchimp-marketing
in a cell in your Jupyter notebook. The mailchimp-marketing
package is the official Python client library for accessing the Mailchimp Marketing API.
The package lets you run API queries against your Mailchimp account and fetch data to show in reports, create lists, tags, users, and do a whole range of other things. Effectively, the Mailchimp API lets you do almost everything you can do in the Mailchimp interface via a simple API query.
!pip3 install mailchimp-marketing
After you’ve installed the package, you’ll need to import the Pandas package, the mailchimp_marketing
package, and the ApiClientError
module from the api_client
package.
import pandas as pd
import mailchimp_marketing as MailchimpMarketing
from mailchimp_marketing.api_client import ApiClientError
Accessing the data in your Mailchimp account is very easy. You just need to create an API key from within your account and identify the server prefix. This is attached to the end of your API key - mine was us2
. Once you’ve got these, assign them to a couple of variables so you can access them elsewhere in the scripts.
API_KEY = 'XXXXXXXXXXXXXXXXXXXXXXXX-us2'
SERVER_PREFIX = 'us2'
To check your credentials are working correctly, you can use the snippet of code below. This will use Client()
to create a new client object, then pass in your API key and server prefix as configuration parameters. It will then use the ping.get()
method and return the response as a dictionary. If that works you get a health_status
message stating “Everything’s Chimpy”. If it fails, you’ve get an exception message telling you why from the try except block.
try:
client = MailchimpMarketing.Client()
client.set_config({
"api_key": API_KEY,
"server": SERVER_PREFIX
})
response = client.ping.get()
print(response)
except ApiClientError as error:
print(error)
{'health_status': "Everything's Chimpy!"}
The Mailchimp Marketing package includes a REST API component, letting you create pretty much anything you need, as well as a set of useful functions that let you perform common tasks. One of these is the reports.get_all_campaign_reports()
method, which perhaps unsurprisingly, returns all the campaign reports present in your Mailchimp account. I’ve created a little function to run this method or throw an exception.
If you run the function you’ll get back a massive Python dictionary. I’ve not shown this to avoid revealing anything sensitive in my Mailchimp account, but if you run the function you’ll be able to see what we’re dealing with.
The dictionary contains a list called reports
which includes a dictionary containing a whole load of data on each of your Mailchimp email marketing campaigns. Since this is too large and complex to display as it is, or import into Pandas directly without modification, we’ll use some dictionary parsing techniques to extract the important stuff and display it in a Pandas dataframe in the next step.
def _get_all_campaign_reports():
"""Return a list of dictionaries containing reports. """
try:
client = MailchimpMarketing.Client()
client.set_config({
"api_key": API_KEY,
"server": SERVER_PREFIX
})
response = client.reports.get_all_campaign_reports()
return response
except ApiClientError as error:
print("Error: {}".format(error.text))
_get_all_campaign_reports()
To put the data in that massive dictionary into a readable and usable format we will create another function called reports_to_pandas()
. This will run the _get_all_campaign_reports()
function we created above, parse the contents of the dictionary, extract the interesting bits, and put them into a Pandas dataframe.
After running the _get_all_campaign_reports()
function we’ll first create an empty Pandas dataframe containing columns for the values we want to extract from the dictionary. Then we’ll loop over each report in the reports
list, and use the get()
function to extract each named element from the dictionary and assign it to a corresponding value in a dictionary called row
.
If any of the items are present inside a nested dictionary, we can use get()
more than once. For example, report.get('clicks').get('unique_clicks')
will extract the unique_clicks
value from a dictionary called clicks
with the following format: 'clicks': {'clicks_total': 30, 'unique_clicks': 30, 'unique_subscriber_clicks': 28, 'click_rate': 0.010163339382940109, 'last_click': '2021-06-18T15:37:19+00:00'},
.
Once we’ve extracted each element from the dictionary, we’ll then use the append()
function to pass the row to the df
dataframe using the ignore_index=True
argument. We’ll then zero fill any NaN
values and cast the data types of the columns to the correct type, instead of having them all set to object
. Finally, we’ll return the Pandas dataframe.
def reports_to_pandas():
"""Convert a Mailchimp reports dictionary to a Pandas dataframe."""
reports = _get_all_campaign_reports()
df = pd.DataFrame(columns=['id', 'send_time','campaign_title', 'type', 'list_name', 'subject_line',
'preview_text', 'emails_sent', 'abuse_reports', 'unsubscribed',
'hard_bounces', 'soft_bounces', 'syntax_errors', 'forwards_count',
'forwards_opens', 'opens_total', 'unique_opens', 'open_rate',
'clicks_total', 'unique_clicks', 'unique_subscriber_clicks',
'click_rate', 'list_sub_rate', 'list_unsub_rate', 'list_open_rate',
'list_click_rate', 'total_orders', 'total_revenue'
])
if reports:
for report in reports['reports']:
row = {
'id': report.get('id'),
'send_time': report.get('send_time'),
'campaign_title': report.get('campaign_title'),
'type': report.get('type'),
'list_name': report.get('list_name'),
'subject_line': report.get('subject_line'),
'preview_text': report.get('preview_text'),
'emails_sent': report.get('emails_sent'),
'abuse_reports': report.get('abuse_reports'),
'unsubscribed': report.get('unsubscribed'),
'hard_bounces': report.get('bounces').get('hard_bounces'),
'soft_bounces': report.get('bounces').get('soft_bounces'),
'syntax_errors': report.get('syntax_errors'),
'forwards_count': report.get('forwards').get('forwards_count'),
'forwards_opens': report.get('forwards').get('forwards_opens'),
'opens_total': report.get('opens').get('opens_total'),
'unique_opens': report.get('opens').get('unique_opens'),
'open_rate': report.get('opens').get('open_rate'),
'clicks_total': report.get('clicks').get('clicks_total'),
'unique_clicks': report.get('clicks').get('unique_clicks'),
'unique_subscriber_clicks': report.get('clicks').get('unique_subscriber_clicks'),
'click_rate': report.get('clicks').get('click_rate'),
'list_sub_rate': report.get('list_stats').get('sub_rate'),
'list_unsub_rate': report.get('list_stats').get('unsub_rate'),
'list_open_rate': report.get('list_stats').get('open_rate'),
'list_click_rate': report.get('list_stats').get('click_rate'),
'total_orders': report.get('ecommerce').get('total_orders'),
'total_revenue': report.get('ecommerce').get('total_revenue'),
}
df = df.append(row, ignore_index='True')
df = df.fillna(0)
for col in ['emails_sent', 'abuse_reports', 'unsubscribed',
'hard_bounces', 'soft_bounces', 'syntax_errors',
'forwards_count', 'forwards_opens', 'opens_total',
'unique_opens', 'open_rate', 'clicks_total',
'unique_clicks', 'unique_subscriber_clicks', 'click_rate',
'list_sub_rate', 'list_unsub_rate', 'list_unsub_rate',
'list_open_rate', 'list_click_rate', 'total_orders']:
df[col] = df[col].astype(int)
df['total_revenue'] = df['total_revenue'].astype(float)
return df
Now, if you run the reports_to_pandas()
function it will fetch your email marketing reports from your Mailchimp account, parse the dictionary of data, extract the key elements of interest, and place them into a Pandas dataframe. You can then use this for your reports, and save the data to a CSV or other file format.
df = reports_to_pandas()
df.head()
id | send_time | campaign_title | type | list_name | subject_line | preview_text | emails_sent | abuse_reports | unsubscribed | ... | clicks_total | unique_clicks | unique_subscriber_clicks | click_rate | list_sub_rate | list_unsub_rate | list_open_rate | list_click_rate | total_orders | total_revenue | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | [REMOVED] | 2021-06-18T17:00:00+00:00 | [REMOVED] | regular | [REMOVED] | [REMOVED] | [REMOVED] | 19094 | 0 | 9 | ... | 120 | 115 | 94 | 0 | 533 | 356 | 14 | 1 | 0 | 0.0 |
1 | [REMOVED] | 2021-06-18T08:00:00+00:00 | [REMOVED] | regular | [REMOVED] | [REMOVED] | [REMOVED] | 2756 | 0 | 2 | ... | 37 | 35 | 33 | 0 | 65 | 147 | 19 | 1 | 0 | 0.0 |
2 | [REMOVED] | 2021-06-18T08:00:00+00:00 | [REMOVED] | regular | [REMOVED] | [REMOVED] | [REMOVED] | 23755 | 3 | 25 | ... | 300 | 261 | 231 | 0 | 743 | 472 | 21 | 1 | 0 | 0.0 |
3 | [REMOVED] | 2021-06-15T18:00:00+00:00 | [REMOVED] | regular | [REMOVED] | [REMOVED] | [REMOVED] | 12662 | 0 | 11 | ... | 132 | 114 | 96 | 0 | 743 | 472 | 21 | 1 | 0 | 0.0 |
4 | [REMOVED] | 2021-06-15T18:00:00+00:00 | [REMOVED] | regular | [REMOVED] | [REMOVED] | [REMOVED] | 11471 | 0 | 7 | ... | 297 | 250 | 200 | 0 | 743 | 472 | 21 | 1 | 0 | 0.0 |
5 rows × 28 columns
df.to_csv('reports.csv', index=False)
The other thing you can do with the Mailchimp API that is interesting from a data science perspective is fetch data on the behaviour of individual campaign recipients. The approach for doing this is almost exactly the same as the one above.
However, this time we’re using a different function called get_campaign_recipients()
to return another massive Python dictionary containing data on individual campaign recipients instead of the campaigns themselves. Since this is campaign specific, we need to pass in the campaign_id
value, which you can find in the dataframe above.
def _get_campaign_recipients(campaign_id):
"""Return a list of dictionaries containing campaign recipients. """
try:
client = MailchimpMarketing.Client()
client.set_config({
"api_key": API_KEY,
"server": SERVER_PREFIX
})
response = client.reports.get_campaign_recipients(campaign_id)
return response
except ApiClientError as error:
print("Error: {}".format(error.text))
As with the previous example, running the _get_campaign_recipients()
method returns a huge and complex nested dictionary. In this one, each recipient is present in a list called sent_to
, so if we loop over each person in this list and then parse the underlying dictionary we can create a Pandas dataframe containing the data on each recipient of our email campaign.
After running the _get_campaign_recipients()
function on our chosen campaign we’ll create an empty Pandas dataframe and define the columns. By default, this will include the email_id
, email_address
, and FNAME
and LNAME
merge vars from your Mailchimp account.
However, as I’ve used EcommerceTools to segment my customers and push in data on each customer’s RFM, ABC, latency, tenure, and churn probability, my data includes a number of custom merge vars specific to my account.
def campaign_recipients_to_pandas(campaign_id):
"""Convert a Mailchimp campaign recipients dictionary to a Pandas dataframe."""
reports = _get_campaign_recipients(campaign_id)
df = pd.DataFrame(columns=['email_id', 'email_address', 'FNAME', 'LNAME', 'LATENCY',
'RFMLABEL', 'TYPE', 'ABC_CLASS', 'PRED_CLV',
'R', 'F', 'M', 'H', 'TENURE', 'status', 'open_count'])
if reports:
for report in reports['sent_to']:
row = {
'email_id': report.get('email_id'),
'email_address': report.get('email_address'),
'FNAME': report.get('merge_fields').get('FNAME'),
'LNAME': report.get('merge_fields').get('LNAME'),
'LATENCY': report.get('merge_fields').get('LATENCY'),
'RFMLABEL': report.get('merge_fields').get('RFMLABEL'),
'TYPE': report.get('merge_fields').get('TYPE'),
'ABC_CLASS': report.get('merge_fields').get('ABC_CLASS'),
'PRED_CLV': report.get('merge_fields').get('PRED_CLV'),
'R': report.get('merge_fields').get('R'),
'F': report.get('merge_fields').get('F'),
'M': report.get('merge_fields').get('M'),
'H': report.get('merge_fields').get('H'),
'TENURE': report.get('merge_fields').get('TENURE'),
'status': report.get('status'),
'open_count': report.get('open_count'),
}
df = df.append(row, ignore_index='True')
for col in ['R', 'F', 'M', 'H', 'TENURE', 'open_count']:
df[col] = df[col].astype(int)
return df
After running your campaign_recipients_to_pandas()
function on your chosen campaign_id
, you get back a neat Pandas dataframe containing the data on each recipient who was mailed your campaign.
df = campaign_recipients_to_pandas('85cd6d60f4')
df.head()
email_id | email_address | FNAME | LNAME | LATENCY | RFMLABEL | TYPE | ABC_CLASS | PRED_CLV | R | F | M | H | TENURE | status | open_count | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | [REMOVED] | [REMOVED] | [REMOVED] | [REMOVED] | Loyal | customer | D | 505 | 1 | 55 | 1 | 505 | sent | 0 | ||
2 | [REMOVED] | [REMOVED] | [REMOVED] | [REMOVED] | Order overdue | Loyal | customer | D | 6.86 | 560 | 5 | 259 | 3 | 1357 | sent | 0 |
3 | [REMOVED] | [REMOVED] | [REMOVED] | [REMOVED] | Not sure | Hold and improve | customer | D | 0.33 | 946 | 2 | 109 | 1 | 1026 | sent | 0 |
4 | [REMOVED] | [REMOVED] | [REMOVED] | [REMOVED] | Risky | customer | D | 1191 | 1 | 34 | 1 | 1191 | sent | 0 |
Finally, we can do a bit of basic feature engineering to convert some of the categorical features from our Mailchimp customer segmentation to a numeric format. To do this we’ll create a list called column
which contains the categorical data, then use a for
loop to run the Pandas get_dummies()
function on the column to one-hot encode the data, then we’ll join that back to our original dataframe.
columns = ['LATENCY', 'RFMLABEL', 'TYPE', 'ABC_CLASS']
for col in columns:
encoding = pd.get_dummies(df[col], prefix=col)
df = pd.concat([df, encoding], axis=1)
Now, if we run a Pearson’s correlation on the open_count
column, we can see which of the columns and values from our customer segmentation are those which have the strongest impact on email opens. As you can see, there are extremely high correlations with the F
value, denoting each customer’s frequency of ordering, as well as those customers with the RFMLABEL
of “Star”.
Similarly, customers who have a LATENCY
value of “Order overdue” are opening more than those with “Order due soon” and “Order not due”, so we could definitely make use of this label to prevent over-mailing customers who aren’t due to order at the moment.
While RFM and latency may be relatively simple metrics, this shows that they’re extremely powerful in predicting email subscriber behaviour, so can be really useful in improving your email marketing activity.
df[df.columns[1:]].corr()['open_count'][:].sort_values(ascending=False)
open_count 1.000000
F 0.863903
RFMLABEL_Star 0.666667
LATENCY_Order overdue 0.666667
M 0.564565
ABC_CLASS_A 0.509175
TENURE 0.256028
LATENCY_Order due soon -0.111111
RFMLABEL_Hold and improve -0.111111
LATENCY_Order not due -0.111111
ABC_CLASS_C -0.111111
H -0.206725
RFMLABEL_Risky -0.218218
LATENCY_Not sure -0.218218
LATENCY_ -0.218218
RFMLABEL_Loyal -0.272166
R -0.331784
ABC_CLASS_D -0.408248
TYPE_customer NaN
Name: open_count, dtype: float64
One of the key things that impacts email marketing performance is the size of your email list. Every time people sign up to your list you’ll top up the natural attrition you get from unsubscribes. However, if you over-mail or annoy customers by unleashing your spam cannon too often, or send them useless content, they’ll unsubscribe faster than you top them up, causing your list to decline in size.
The Mailchimp API includes a useful method called client.lists.get_list_growth_history()
that returns the number of subscribed list members, and the number of unsubscribes, for each period. As with the above code, we’ll first make a helper function to use internally to fetch that data for our list.
def _get_growth_history(list_id):
"""Measure the growth history of a mailing list."""
try:
client = MailchimpMarketing.Client()
client.set_config({
"api_key": API_KEY,
"server": SERVER_PREFIX
})
response = client.lists.get_list_growth_history(list_id, count=1000)
return response
except ApiClientError as error:
print("Error: {}".format(error.text))
Next, we’ll create a function called growth_history_to_pandas()
which runs the helper function and reformats the data into a Pandas dataframe. This gives us the number of subscribed users in the list for each calendar month. To calculate the change in size month-on-month and year-on-year we can use the shift()
function to create a lagged variable.
To get the monthly change in the list size, we take the current number of subscribers and subtract the number of subscribers in the list the previous month using shift(-1)
. To do this for the same month the previous year we can use shift(-12)
.
def growth_history_to_pandas(list_id):
"""Convert a Mailchimp growth history dictionary to a Pandas dataframe."""
results = _get_growth_history(list_id)
df = pd.DataFrame(columns=['month', 'subscribed'])
if results:
for data in results['history']:
row = {
'month': data.get('month'),
'subscribed': data.get('subscribed'),
}
df = df.append(row, ignore_index='True')
for col in ['subscribed']:
df[col] = df[col].astype(int)
df['list_change_mom'] = df['subscribed'] - df['subscribed'].shift(-1)
df['list_change_yoy'] = df['subscribed'] - df['subscribed'].shift(-12)
return df
Running that function on our given list_id
gives us a neat dataframe containing the results. We can see from this list that it decreased in size in December 2020, but started growing significantly in spring and early summer, but has dropped off in July.
df = growth_history_to_pandas('xxxxxxxx')
df.head(14)
month | subscribed | list_change_mom | list_change_yoy | |
---|---|---|---|---|
0 | 2021-07 | 52898 | 714.0 | 23098.0 |
1 | 2021-06 | 52184 | 3148.0 | 23733.0 |
2 | 2021-05 | 49036 | 8051.0 | 22799.0 |
3 | 2021-04 | 40985 | 3066.0 | 18609.0 |
4 | 2021-03 | 37919 | 2268.0 | 18796.0 |
5 | 2021-02 | 35651 | 1125.0 | 17995.0 |
6 | 2021-01 | 34526 | 1173.0 | 17401.0 |
7 | 2020-12 | 33353 | -89.0 | 16890.0 |
8 | 2020-11 | 33442 | 457.0 | 16974.0 |
9 | 2020-10 | 32985 | 972.0 | 16776.0 |
10 | 2020-09 | 32013 | 1114.0 | 16021.0 |
11 | 2020-08 | 30899 | 1099.0 | 14890.0 |
12 | 2020-07 | 29800 | 1349.0 | 14087.0 |
13 | 2020-06 | 28451 | 2214.0 | 13187.0 |
Now you have these data on your list size, another really useful metric to calculate is the revenue you generate from each email subscriber. To do this, you simply calculate the email revenue generated during a month and year and divide it by the number of list subscribers. If you’re managing your list well, you would expect this figure to rise with time.
Matt Clarke, Saturday, June 19, 2021