How to use the Mailchimp Marketing Python API with Pandas

Learn how to use the Mailchimp API in Python by creating email marketing reports using Pandas and analysing your segments to find out what influences open rates.

How to use the Mailchimp Marketing Python API with Pandas
Picture by Krsto Jevtic, Unsplash.
27 minutes to read

In ecommerce, email marketing remains one of the most effective (and cost-effective) digital marketing techniques, especially when combined with data science techniques. The vast amounts of customer data generated in ecommerce businesses gives you direct access to some extremely powerful features that can be used to provide more timely, relevant, and interesting content to email subscribers based on what you know about them, or what your models can predict about them.

To get the best results from email marketing, you need to undertake regular analyses of email marketing campaign performance to better understand what works and what doesn’t. The popular Mailchimp Email Service Provider (ESP) platform is one of the most commonly used in ecommerce, especially in small to medium sized companies, and it gives you easy access to these data through a user-friendly user interface.

However, to really get a strong understanding of your campaign performance, and the behaviour of your individual subscribers and the ways in which they’ve interacted with specific campaigns, it’s useful to export the data from Mailchimp and analyse it in Python using Pandas. Thankfully, Mailchimp has a superb Marketing API to allow you to access the data in your account. Here’s a quick guide to using it.

Load the packages

To access the Mailchimp API you’ll need to install the mailchimp-marketing Python package from PyPi by entering pip3 install mailchimp-marketing in your terminal, or !pip3 install mailchimp-marketing in a cell in your Jupyter notebook. The mailchimp-marketing package is the official Python client library for accessing the Mailchimp Marketing API.

The package lets you run API queries against your Mailchimp account and fetch data to show in reports, create lists, tags, users, and do a whole range of other things. Effectively, the Mailchimp API lets you do almost everything you can do in the Mailchimp interface via a simple API query.

!pip3 install mailchimp-marketing

After you’ve installed the package, you’ll need to import the Pandas package, the mailchimp_marketing package, and the ApiClientError module from the api_client package.

import pandas as pd
import mailchimp_marketing as MailchimpMarketing
from mailchimp_marketing.api_client import ApiClientError

Get your Mailchimp API credentials

Accessing the data in your Mailchimp account is very easy. You just need to create an API key from within your account and identify the server prefix. This is attached to the end of your API key - mine was us2. Once you’ve got these, assign them to a couple of variables so you can access them elsewhere in the scripts.

API_KEY = 'XXXXXXXXXXXXXXXXXXXXXXXX-us2'
SERVER_PREFIX = 'us2'

To check your credentials are working correctly, you can use the snippet of code below. This will use Client() to create a new client object, then pass in your API key and server prefix as configuration parameters. It will then use the ping.get() method and return the response as a dictionary. If that works you get a health_status message stating “Everything’s Chimpy”. If it fails, you’ve get an exception message telling you why from the try except block.

try:
    client = MailchimpMarketing.Client()
    client.set_config({
        "api_key": API_KEY,
        "server": SERVER_PREFIX
    })
    response = client.ping.get()
    print(response)
except ApiClientError as error:
    print(error)
{'health_status': "Everything's Chimpy!"}

Export your Mailchimp campaign reports

The Mailchimp Marketing package includes a REST API component, letting you create pretty much anything you need, as well as a set of useful functions that let you perform common tasks. One of these is the reports.get_all_campaign_reports() method, which perhaps unsurprisingly, returns all the campaign reports present in your Mailchimp account. I’ve created a little function to run this method or throw an exception.

If you run the function you’ll get back a massive Python dictionary. I’ve not shown this to avoid revealing anything sensitive in my Mailchimp account, but if you run the function you’ll be able to see what we’re dealing with.

The dictionary contains a list called reports which includes a dictionary containing a whole load of data on each of your Mailchimp email marketing campaigns. Since this is too large and complex to display as it is, or import into Pandas directly without modification, we’ll use some dictionary parsing techniques to extract the important stuff and display it in a Pandas dataframe in the next step.

def _get_all_campaign_reports():
    """Return a list of dictionaries containing reports. """
    try:
        client = MailchimpMarketing.Client()
        client.set_config({
            "api_key": API_KEY,
            "server": SERVER_PREFIX
        })

        response = client.reports.get_all_campaign_reports()
        return response
    except ApiClientError as error:
          print("Error: {}".format(error.text))
_get_all_campaign_reports()

Import your Mailchimp reports into Pandas

To put the data in that massive dictionary into a readable and usable format we will create another function called reports_to_pandas(). This will run the _get_all_campaign_reports() function we created above, parse the contents of the dictionary, extract the interesting bits, and put them into a Pandas dataframe.

After running the _get_all_campaign_reports() function we’ll first create an empty Pandas dataframe containing columns for the values we want to extract from the dictionary. Then we’ll loop over each report in the reports list, and use the get() function to extract each named element from the dictionary and assign it to a corresponding value in a dictionary called row.

If any of the items are present inside a nested dictionary, we can use get() more than once. For example, report.get('clicks').get('unique_clicks') will extract the unique_clicks value from a dictionary called clicks with the following format: 'clicks': {'clicks_total': 30, 'unique_clicks': 30, 'unique_subscriber_clicks': 28, 'click_rate': 0.010163339382940109, 'last_click': '2021-06-18T15:37:19+00:00'},.

Once we’ve extracted each element from the dictionary, we’ll then use the append() function to pass the row to the df dataframe using the ignore_index=True argument. We’ll then zero fill any NaN values and cast the data types of the columns to the correct type, instead of having them all set to object. Finally, we’ll return the Pandas dataframe.

def reports_to_pandas():
    """Convert a Mailchimp reports dictionary to a Pandas dataframe."""
    
    reports = _get_all_campaign_reports()
    
    df = pd.DataFrame(columns=['id', 'send_time','campaign_title', 'type', 'list_name', 'subject_line', 
                               'preview_text', 'emails_sent', 'abuse_reports', 'unsubscribed', 
                               'hard_bounces', 'soft_bounces', 'syntax_errors', 'forwards_count',
                               'forwards_opens', 'opens_total', 'unique_opens', 'open_rate',
                               'clicks_total', 'unique_clicks', 'unique_subscriber_clicks', 
                               'click_rate', 'list_sub_rate', 'list_unsub_rate', 'list_open_rate', 
                               'list_click_rate', 'total_orders', 'total_revenue'
                              ])
    
    if reports:
        for report in reports['reports']: 
            row = {
                'id': report.get('id'),
                'send_time': report.get('send_time'),                
                'campaign_title': report.get('campaign_title'),
                'type': report.get('type'),
                'list_name': report.get('list_name'),
                'subject_line': report.get('subject_line'),
                'preview_text': report.get('preview_text'),
                'emails_sent': report.get('emails_sent'),
                'abuse_reports': report.get('abuse_reports'),
                'unsubscribed': report.get('unsubscribed'),
                'hard_bounces': report.get('bounces').get('hard_bounces'),
                'soft_bounces': report.get('bounces').get('soft_bounces'),
                'syntax_errors': report.get('syntax_errors'),
                'forwards_count': report.get('forwards').get('forwards_count'),
                'forwards_opens': report.get('forwards').get('forwards_opens'),
                'opens_total': report.get('opens').get('opens_total'),
                'unique_opens': report.get('opens').get('unique_opens'),
                'open_rate': report.get('opens').get('open_rate'),
                'clicks_total': report.get('clicks').get('clicks_total'),
                'unique_clicks': report.get('clicks').get('unique_clicks'),
                'unique_subscriber_clicks': report.get('clicks').get('unique_subscriber_clicks'),
                'click_rate': report.get('clicks').get('click_rate'),
                'list_sub_rate': report.get('list_stats').get('sub_rate'),
                'list_unsub_rate': report.get('list_stats').get('unsub_rate'),
                'list_open_rate': report.get('list_stats').get('open_rate'),
                'list_click_rate': report.get('list_stats').get('click_rate'),
                'total_orders': report.get('ecommerce').get('total_orders'),
                'total_revenue': report.get('ecommerce').get('total_revenue'),
            }
            
            df = df.append(row, ignore_index='True')
        
        df = df.fillna(0)
            
        for col in ['emails_sent', 'abuse_reports', 'unsubscribed', 
                    'hard_bounces', 'soft_bounces', 'syntax_errors',
                    'forwards_count', 'forwards_opens', 'opens_total', 
                    'unique_opens', 'open_rate', 'clicks_total', 
                    'unique_clicks', 'unique_subscriber_clicks', 'click_rate',
                    'list_sub_rate', 'list_unsub_rate', 'list_unsub_rate', 
                    'list_open_rate', 'list_click_rate', 'total_orders']:
            df[col] = df[col].astype(int)
            
        df['total_revenue'] = df['total_revenue'].astype(float)
        
        return df

Now, if you run the reports_to_pandas() function it will fetch your email marketing reports from your Mailchimp account, parse the dictionary of data, extract the key elements of interest, and place them into a Pandas dataframe. You can then use this for your reports, and save the data to a CSV or other file format.

df = reports_to_pandas()
df.head()
id send_time campaign_title type list_name subject_line preview_text emails_sent abuse_reports unsubscribed ... clicks_total unique_clicks unique_subscriber_clicks click_rate list_sub_rate list_unsub_rate list_open_rate list_click_rate total_orders total_revenue
0 [REMOVED] 2021-06-18T17:00:00+00:00 [REMOVED] regular [REMOVED] [REMOVED] [REMOVED] 19094 0 9 ... 120 115 94 0 533 356 14 1 0 0.0
1 [REMOVED] 2021-06-18T08:00:00+00:00 [REMOVED] regular [REMOVED] [REMOVED] [REMOVED] 2756 0 2 ... 37 35 33 0 65 147 19 1 0 0.0
2 [REMOVED] 2021-06-18T08:00:00+00:00 [REMOVED] regular [REMOVED] [REMOVED] [REMOVED] 23755 3 25 ... 300 261 231 0 743 472 21 1 0 0.0
3 [REMOVED] 2021-06-15T18:00:00+00:00 [REMOVED] regular [REMOVED] [REMOVED] [REMOVED] 12662 0 11 ... 132 114 96 0 743 472 21 1 0 0.0
4 [REMOVED] 2021-06-15T18:00:00+00:00 [REMOVED] regular [REMOVED] [REMOVED] [REMOVED] 11471 0 7 ... 297 250 200 0 743 472 21 1 0 0.0

5 rows × 28 columns

df.to_csv('reports.csv', index=False)

Analyse recipient activity data

The other thing you can do with the Mailchimp API that is interesting from a data science perspective is fetch data on the behaviour of individual campaign recipients. The approach for doing this is almost exactly the same as the one above.

However, this time we’re using a different function called get_campaign_recipients() to return another massive Python dictionary containing data on individual campaign recipients instead of the campaigns themselves. Since this is campaign specific, we need to pass in the campaign_id value, which you can find in the dataframe above.

def _get_campaign_recipients(campaign_id):
    """Return a list of dictionaries containing campaign recipients. """
    try:
        client = MailchimpMarketing.Client()
        client.set_config({
            "api_key": API_KEY,
            "server": SERVER_PREFIX
        })

        response = client.reports.get_campaign_recipients(campaign_id)
        return response
    except ApiClientError as error:
          print("Error: {}".format(error.text))

As with the previous example, running the _get_campaign_recipients() method returns a huge and complex nested dictionary. In this one, each recipient is present in a list called sent_to, so if we loop over each person in this list and then parse the underlying dictionary we can create a Pandas dataframe containing the data on each recipient of our email campaign.

After running the _get_campaign_recipients() function on our chosen campaign we’ll create an empty Pandas dataframe and define the columns. By default, this will include the email_id, email_address, and FNAME and LNAME merge vars from your Mailchimp account.

However, as I’ve used EcommerceTools to segment my customers and push in data on each customer’s RFM, ABC, latency, tenure, and churn probability, my data includes a number of custom merge vars specific to my account.

def campaign_recipients_to_pandas(campaign_id):
    """Convert a Mailchimp campaign recipients dictionary to a Pandas dataframe."""
    
    reports = _get_campaign_recipients(campaign_id)

    df = pd.DataFrame(columns=['email_id', 'email_address', 'FNAME', 'LNAME', 'LATENCY', 
                               'RFMLABEL', 'TYPE', 'ABC_CLASS', 'PRED_CLV', 
                               'R', 'F', 'M', 'H', 'TENURE', 'status', 'open_count'])
    
    if reports:
        for report in reports['sent_to']: 
            row = {
                'email_id': report.get('email_id'),
                'email_address': report.get('email_address'),
                'FNAME': report.get('merge_fields').get('FNAME'),
                'LNAME': report.get('merge_fields').get('LNAME'),
                'LATENCY': report.get('merge_fields').get('LATENCY'),
                'RFMLABEL': report.get('merge_fields').get('RFMLABEL'),
                'TYPE': report.get('merge_fields').get('TYPE'),
                'ABC_CLASS': report.get('merge_fields').get('ABC_CLASS'),
                'PRED_CLV': report.get('merge_fields').get('PRED_CLV'),
                'R': report.get('merge_fields').get('R'),
                'F': report.get('merge_fields').get('F'),
                'M': report.get('merge_fields').get('M'),
                'H': report.get('merge_fields').get('H'),              
                'TENURE': report.get('merge_fields').get('TENURE'),              
                'status': report.get('status'),
                'open_count': report.get('open_count'),
            }
        
            df = df.append(row, ignore_index='True')
        
        for col in ['R', 'F', 'M', 'H', 'TENURE', 'open_count']:
            df[col] = df[col].astype(int)
        
        return df

After running your campaign_recipients_to_pandas() function on your chosen campaign_id, you get back a neat Pandas dataframe containing the data on each recipient who was mailed your campaign.

df = campaign_recipients_to_pandas('85cd6d60f4')
df.head()
email_id email_address FNAME LNAME LATENCY RFMLABEL TYPE ABC_CLASS PRED_CLV R F M H TENURE status open_count
0 [REMOVED] [REMOVED] [REMOVED] [REMOVED] Loyal customer D 505 1 55 1 505 sent 0
2 [REMOVED] [REMOVED] [REMOVED] [REMOVED] Order overdue Loyal customer D 6.86 560 5 259 3 1357 sent 0
3 [REMOVED] [REMOVED] [REMOVED] [REMOVED] Not sure Hold and improve customer D 0.33 946 2 109 1 1026 sent 0
4 [REMOVED] [REMOVED] [REMOVED] [REMOVED] Risky customer D 1191 1 34 1 1191 sent 0

Analysing your email marketing segmentation

Finally, we can do a bit of basic feature engineering to convert some of the categorical features from our Mailchimp customer segmentation to a numeric format. To do this we’ll create a list called column which contains the categorical data, then use a for loop to run the Pandas get_dummies() function on the column to one-hot encode the data, then we’ll join that back to our original dataframe.

columns = ['LATENCY', 'RFMLABEL', 'TYPE', 'ABC_CLASS']
for col in columns:
    encoding = pd.get_dummies(df[col], prefix=col)
    df = pd.concat([df, encoding], axis=1)

Now, if we run a Pearson’s correlation on the open_count column, we can see which of the columns and values from our customer segmentation are those which have the strongest impact on email opens. As you can see, there are extremely high correlations with the F value, denoting each customer’s frequency of ordering, as well as those customers with the RFMLABEL of “Star”.

Similarly, customers who have a LATENCY value of “Order overdue” are opening more than those with “Order due soon” and “Order not due”, so we could definitely make use of this label to prevent over-mailing customers who aren’t due to order at the moment.

While RFM and latency may be relatively simple metrics, this shows that they’re extremely powerful in predicting email subscriber behaviour, so can be really useful in improving your email marketing activity.

df[df.columns[1:]].corr()['open_count'][:].sort_values(ascending=False)
open_count                   1.000000
F                            0.863903
RFMLABEL_Star                0.666667
LATENCY_Order overdue        0.666667
M                            0.564565
ABC_CLASS_A                  0.509175
TENURE                       0.256028
LATENCY_Order due soon      -0.111111
RFMLABEL_Hold and improve   -0.111111
LATENCY_Order not due       -0.111111
ABC_CLASS_C                 -0.111111
H                           -0.206725
RFMLABEL_Risky              -0.218218
LATENCY_Not sure            -0.218218
LATENCY_                    -0.218218
RFMLABEL_Loyal              -0.272166
R                           -0.331784
ABC_CLASS_D                 -0.408248
TYPE_customer                     NaN
Name: open_count, dtype: float64

Measuring list growth history

One of the key things that impacts email marketing performance is the size of your email list. Every time people sign up to your list you’ll top up the natural attrition you get from unsubscribes. However, if you over-mail or annoy customers by unleashing your spam cannon too often, or send them useless content, they’ll unsubscribe faster than you top them up, causing your list to decline in size.

The Mailchimp API includes a useful method called client.lists.get_list_growth_history() that returns the number of subscribed list members, and the number of unsubscribes, for each period. As with the above code, we’ll first make a helper function to use internally to fetch that data for our list.

def _get_growth_history(list_id):
    """Measure the growth history of a mailing list."""
    try:
        client = MailchimpMarketing.Client()
        client.set_config({
            "api_key": API_KEY,
            "server": SERVER_PREFIX
        })
        
        response = client.lists.get_list_growth_history(list_id, count=1000)
        return response
    except ApiClientError as error:
          print("Error: {}".format(error.text))

Next, we’ll create a function called growth_history_to_pandas() which runs the helper function and reformats the data into a Pandas dataframe. This gives us the number of subscribed users in the list for each calendar month. To calculate the change in size month-on-month and year-on-year we can use the shift() function to create a lagged variable.

To get the monthly change in the list size, we take the current number of subscribers and subtract the number of subscribers in the list the previous month using shift(-1). To do this for the same month the previous year we can use shift(-12).

def growth_history_to_pandas(list_id):
    """Convert a Mailchimp growth history dictionary to a Pandas dataframe."""
    
    results = _get_growth_history(list_id)

    df = pd.DataFrame(columns=['month', 'subscribed'])
    
    if results:
        for data in results['history']: 
            row = {
                'month': data.get('month'),
                'subscribed': data.get('subscribed'),
            }
        
            df = df.append(row, ignore_index='True')
        
        for col in ['subscribed']:
            df[col] = df[col].astype(int)
        
        df['list_change_mom'] = df['subscribed'] - df['subscribed'].shift(-1)
        df['list_change_yoy'] = df['subscribed'] - df['subscribed'].shift(-12)
        return df

Running that function on our given list_id gives us a neat dataframe containing the results. We can see from this list that it decreased in size in December 2020, but started growing significantly in spring and early summer, but has dropped off in July.

df = growth_history_to_pandas('xxxxxxxx')
df.head(14)
month subscribed list_change_mom list_change_yoy
0 2021-07 52898 714.0 23098.0
1 2021-06 52184 3148.0 23733.0
2 2021-05 49036 8051.0 22799.0
3 2021-04 40985 3066.0 18609.0
4 2021-03 37919 2268.0 18796.0
5 2021-02 35651 1125.0 17995.0
6 2021-01 34526 1173.0 17401.0
7 2020-12 33353 -89.0 16890.0
8 2020-11 33442 457.0 16974.0
9 2020-10 32985 972.0 16776.0
10 2020-09 32013 1114.0 16021.0
11 2020-08 30899 1099.0 14890.0
12 2020-07 29800 1349.0 14087.0
13 2020-06 28451 2214.0 13187.0

Now you have these data on your list size, another really useful metric to calculate is the revenue you generate from each email subscriber. To do this, you simply calculate the email revenue generated during a month and year and divide it by the number of list subscribers. If you’re managing your list well, you would expect this figure to rise with time.

Matt Clarke, Saturday, June 19, 2021

Matt Clarke Matt is an Ecommerce and Marketing Director who uses data science to help in his work. Matt has a Master's degree in Internet Retailing (plus two other Master's degrees in different fields) and specialises in the technical side of ecommerce and marketing.