In B2B ecommerce, there are two main approaches to new customer acquisition: you either rely on your website to acquire customers for you, or you target specific customers through sales calls or marketing, typically using data acquisition, or LinkedIn.
B2B sales teams generally manage the larger accounts, who spend more, place more frequent orders, and have specific requirements for account management. Sales directors want assistance in acquiring more of these, but they’re few and far between, and take time to grow.
B2B ecommerce websites generate high volumes of acquisitions, but do so in an uncontrolled manner. Since it’s rare to acquire a customer who is already large and is going to spend a lot from day one, B2B ecommerce acquisitions will be smaller, so data acquisition is used to target larger businesses that aren’t being picked up online.
To identify the specific clients for their sales or marketing team to contact, B2B ecommerce retailers will use either intuition, or better, the results of lookalike modeling. Lookalike modeling aims to examine the targetable traits of existing high value customers and then seeks to acquire lists containing more customers with similar traits.
For example, lookalike modeling might show a correlation between the company sector or location and the likelihood for a business to become a major client.
Once you’ve identified the correlated traits, you can provide your list of requirements to a data provider, such as Experian, along with your customer database, and a hefty fee. They’ll return the list of companies you don’t have, so your sales team can make contact with them.
However, you can actually construct a dataset yourself that contains every business in the UK, potentially saving you thousands. While it doesn’t include every branch, it should include all the parent companies and head office addresses to get you started. Here’s how it’s done.
Picture by Adrian Agawin, Pexels.
Open up a Python script or Jupyter notebook and import
pandas. Most of these modules are built into Python, so you shouldn’t need to install them, and you’ll likely already have Pandas!
from os.path import basename import urllib.request import zipfile import glob import pandas as pd
Next we’re going to download the public Basic Company Data from the UK government’s Companies House website at data.gov.uk. This data is updated monthly and is available for download within five days of the previous month end.
There’s well over 1 GB of raw data here, which is currently split up into six separate zip files containing around 70 MB of compressed data and about 410MB each when uncompressed. We’ll grab the URLs to each of these zip files and place them in a list called
urls = ['http://download.companieshouse.gov.uk/BasicCompanyData-part1.zip', 'http://download.companieshouse.gov.uk/BasicCompanyData-part2.zip', 'http://download.companieshouse.gov.uk/BasicCompanyData-part3.zip', 'http://download.companieshouse.gov.uk/BasicCompanyData-part4.zip', 'http://download.companieshouse.gov.uk/BasicCompanyData-part5.zip', 'http://download.companieshouse.gov.uk/BasicCompanyData-part6.zip']
Next, we’ll create a
for loop and loop through each of the URLs. For each URL, we’ll create a HTTP request using
urllib.request.urlopen() and we’ll parse the
response.url to obtain the filename component. Then, we’ll use
urllib.request.urlretrieve() to download each file to a directory called
data and we’ll preserve the original filename.
for url in urls: print('Downloading:', url) response = urllib.request.urlopen(url) filename = basename(response.url) urllib.request.urlretrieve(url, 'data/'+filename)
Downloading: http://download.companieshouse.gov.uk/BasicCompanyData-part1.zip Downloading: http://download.companieshouse.gov.uk/BasicCompanyData-part2.zip Downloading: http://download.companieshouse.gov.uk/BasicCompanyData-part3.zip Downloading: http://download.companieshouse.gov.uk/BasicCompanyData-part4.zip Downloading: http://download.companieshouse.gov.uk/BasicCompanyData-part5.zip Downloading: http://download.companieshouse.gov.uk/BasicCompanyData-part6.zip
The files are currently compressed with zip, so we need to uncompress them and extract the CSV file held within each one. First, we’ll use
glob to search the
data directory and return a list containing all of the file names that have a
files = glob.glob('data/*.zip') files
['data/BasicCompanyData-part1.zip', 'data/BasicCompanyData-part2.zip', 'data/BasicCompanyData-part4.zip', 'data/BasicCompanyData-part3.zip', 'data/BasicCompanyData-part5.zip', 'data/BasicCompanyData-part6.zip']
Next, we’ll create a
for loop and loop through the files, first opening the file using
ZipFile() and then extracting its contents with
extractall(), which we’ll save to a new directory called
for file in files: print('Unzipping:',file) with zipfile.ZipFile(file, 'r') as unzip: unzip.extractall('data/raw')
Unzipping: data/BasicCompanyData-part1.zip Unzipping: data/BasicCompanyData-part2.zip Unzipping: data/BasicCompanyData-part4.zip Unzipping: data/BasicCompanyData-part3.zip Unzipping: data/BasicCompanyData-part5.zip Unzipping: data/BasicCompanyData-part6.zip
Next we can use
glob() again to obtain a list of all the CSV files in the
data/raw directory and assign their file paths to a list called
csv_files = glob.glob('data/raw/*.csv') csv_files
['data/raw/BasicCompanyData-2021-01-01-part2_6.csv', 'data/raw/BasicCompanyData-2021-01-01-part1_6.csv', 'data/raw/BasicCompanyData-2021-01-01-part3_6.csv', 'data/raw/BasicCompanyData-2021-01-01-part4_6.csv', 'data/raw/BasicCompanyData-2021-01-01-part5_6.csv', 'data/raw/BasicCompanyData-2021-01-01-part6_6.csv']
Finally, we’ll create a new list called
all_files, then we’ll loop through each file, read the CSV into a Pandas dataframe, and we’ll append the dataframe to
All we need to do to put these in one huge file is to pass the
all_files list to
pd.concat() and we’ve merged everything together.
all_files =  for filename in csv_files: df = pd.read_csv(filename, index_col=None, header=0, low_memory=False) all_files.append(df) companies = pd.concat(all_files, axis=0, ignore_index=True)
|CompanyName||CDS LABOUR (PRESTON) LTD|
|CompanyCategory||Private Limited Company|
|CompanyStatus||Active - Proposal to Strike off|
|Accounts.AccountCategory||NO ACCOUNTS FILED|
|SICCode.SicText_1||78200 - Temporary employment agency activities|
|PreviousName_1.CompanyName||CL MS (BOSTON) LTD|
This gives us a dataset containing 4.837 million companies, which is every business head office held in Companies House. You can now use your data science skills to see who you haven’t got, and who your sales and marketing teams should be contacting.
The data in Companies House is added by the companies themselves, so you’ll typically get the head office address (not the branches), and the SIC codes are those added by the company, which may not represent the actual SIC code if you assigned it manually.
Matt Clarke, Saturday, March 13, 2021