As I explained in my previous post, many B2B ecommerce businesses spend huge amounts on procuring third-party data for companies they wish to target. However, with some data science skills and a bit of ingenuity you can also produce a dataset of your own - for free.
In this project, I’ll show you can use this data to target specific companies. We’ll see what companies are included, identify the features we want to target and exclude, and use Pandas to create a dataset of estate agents we can approach. Here’s how it’s done.
First, open up a Jupyter notebook and import pandas
. As the dataset we’re using contains many columns, you may also want to use set_option()
to increase the number of rows and columns shown. Needless to say, as this is a massive dataset, you’ll need a powerful workstation!
import pandas as pd
pd.set_option('max_columns', 100)
pd.set_option('max_rows', 100)
I’ll assume you’ve followed my post How to create a dataset containing all UK companies explaining how you can build the required dataset. Once you’ve followed this, load the data into Pandas using the low_memory=False
argument. It currently comes in at around 1.8 GB.
df = pd.read_csv('companies.csv', low_memory=False)
df.head()
CompanyName | CompanyNumber | RegAddress.CareOf | RegAddress.POBox | RegAddress.AddressLine1 | RegAddress.AddressLine2 | RegAddress.PostTown | RegAddress.County | RegAddress.Country | RegAddress.PostCode | CompanyCategory | CompanyStatus | CountryOfOrigin | DissolutionDate | IncorporationDate | Accounts.AccountRefDay | Accounts.AccountRefMonth | Accounts.NextDueDate | Accounts.LastMadeUpDate | Accounts.AccountCategory | Returns.NextDueDate | Returns.LastMadeUpDate | Mortgages.NumMortCharges | Mortgages.NumMortOutstanding | Mortgages.NumMortPartSatisfied | Mortgages.NumMortSatisfied | SICCode.SicText_1 | SICCode.SicText_2 | SICCode.SicText_3 | SICCode.SicText_4 | LimitedPartnerships.NumGenPartners | LimitedPartnerships.NumLimPartners | URI | PreviousName_1.CONDATE | PreviousName_1.CompanyName | PreviousName_2.CONDATE | PreviousName_2.CompanyName | PreviousName_3.CONDATE | PreviousName_3.CompanyName | PreviousName_4.CONDATE | PreviousName_4.CompanyName | PreviousName_5.CONDATE | PreviousName_5.CompanyName | PreviousName_6.CONDATE | PreviousName_6.CompanyName | PreviousName_7.CONDATE | PreviousName_7.CompanyName | PreviousName_8.CONDATE | PreviousName_8.CompanyName | PreviousName_9.CONDATE | PreviousName_9.CompanyName | PreviousName_10.CONDATE | PreviousName_10.CompanyName | ConfStmtNextDueDate | ConfStmtLastMadeUpDate | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | CDS LABOUR (PRESTON) LTD | 11995337 | NaN | NaN | EVENTUS | SUNDERLAND ROAD | PETERBOROUGH | CAMBRIDGESHIRE | UNITED KINGDOM | PE6 8FD | Private Limited Company | Active - Proposal to Strike off | United Kingdom | NaN | 14/05/2019 | 31.0 | 5.0 | 14/05/2021 | NaN | NO ACCOUNTS FILED | 11/06/2020 | NaN | 0 | 0 | 0 | 0 | 78200 - Temporary employment agency activities | NaN | NaN | NaN | 0 | 0 | http://business.data.gov.uk/id/company/11995337 | 01/11/2019 | CL MS (BOSTON) LTD | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 27/05/2020 | NaN |
1 | CDS LABOUR (SOFIA) LTD | 11995382 | NaN | NaN | EVENTUS | SUNDERLAND ROAD | PETERBOROUGH | CAMBRIDGESHIRE | UNITED KINGDOM | PE6 8FD | Private Limited Company | Active - Proposal to Strike off | United Kingdom | NaN | 14/05/2019 | 31.0 | 5.0 | 14/05/2021 | NaN | NO ACCOUNTS FILED | 11/06/2020 | NaN | 0 | 0 | 0 | 0 | 78200 - Temporary employment agency activities | NaN | NaN | NaN | 0 | 0 | http://business.data.gov.uk/id/company/11995382 | 01/11/2019 | CL MC (SPALDING) LTD | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 27/05/2020 | NaN |
2 | CDS LABOUR (WISBECH) LTD | 11543682 | NaN | NaN | EVENTUS | SUNDERLAND ROAD | MARKET DEEPING | LINCOLNSHIRE | ENGLAND | PE6 8FD | Private Limited Company | Active | United Kingdom | NaN | 30/08/2018 | 31.0 | 8.0 | 31/05/2021 | 31/08/2019 | DORMANT | 27/09/2019 | NaN | 0 | 0 | 0 | 0 | 78200 - Temporary employment agency activities | NaN | NaN | NaN | 0 | 0 | http://business.data.gov.uk/id/company/11543682 | 06/09/2019 | CL MC (WISBECH) LTD | 14/05/2019 | CDSL(WISBECH)FBS LTD | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 13/09/2021 | 30/08/2020 |
3 | CDS LEISURE LIMITED | 06910514 | TOPPING PARTNERSHIP | NaN | INCOM HOUSE WATERSIDE | TRAFFORD PARK | MANCHESTER | NaN | NaN | M17 1WD | Private Limited Company | Active | United Kingdom | NaN | 19/05/2009 | 31.0 | 5.0 | 31/05/2021 | 31/05/2019 | TOTAL EXEMPTION FULL | 16/06/2017 | 19/05/2016 | 0 | 0 | 0 | 0 | 46420 - Wholesale of clothing and footwear | NaN | NaN | NaN | 0 | 0 | http://business.data.gov.uk/id/company/06910514 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 02/06/2021 | 19/05/2020 |
4 | CDS LOCUM LIMITED | 12546137 | NaN | NaN | OFFICE 27 | 8-9 RODNEY ROAD | PORTSMOUTH | NaN | UNITED KINGDOM | PO4 8BF | Private Limited Company | Active | United Kingdom | NaN | 03/04/2020 | 30.0 | 4.0 | 03/01/2022 | NaN | NO ACCOUNTS FILED | 01/05/2021 | NaN | 0 | 0 | 0 | 0 | 96090 - Other service activities n.e.c. | NaN | NaN | NaN | 0 | 0 | http://business.data.gov.uk/id/company/12546137 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 16/04/2021 | NaN |
Now that the data are loaded up, we’ll take a look at the value_counts()
for various key columns in the dataset, so we can identify what we need to include and exclude to give us a higher quality final dataset.
First, we’ll check out the CompanyStatus
column. 4.514 million of the companies in the dataset are currently marked as “Active”, so we’ll want to include these and exclude the rest. It would be a waste of time and resources to go after the ones who are about to go tits-up.
df['CompanyStatus'].value_counts()
Active 4514967
Active - Proposal to Strike off 220903
Liquidation 95396
In Administration 3113
Live but Receiver Manager on at least one charge 1315
Voluntary Arrangement 893
In Administration/Administrative Receiver 336
RECEIVERSHIP 194
ADMINISTRATION ORDER 130
ADMINISTRATIVE RECEIVER 116
In Administration/Receiver Manager 44
RECEIVER MANAGER / ADMINISTRATIVE RECEIVER 16
VOLUNTARY ARRANGEMENT / ADMINISTRATIVE RECEIVER 1
VOLUNTARY ARRANGEMENT / RECEIVER MANAGER 1
Name: CompanyStatus, dtype: int64
The Standard Industry Classification or SIC code is a (usually) four digit code used to classify economic activities or industries. SIC codes are used by Companies House and all the third party data re-sellers, such as Experian, to let customers target businesses in specific sectors.
However, importantly, the SIC codes stored in Companies House are added by the company when they file their accounts. If Maggie in accounts at Tesco thinks her office falls under “70.1 Activities of head offices”, she’s free to assign it there, rather than to “47110 Retail sale in non-specialised stores with food, beverages or tobacco predominating”.
The SIC codes have various levels, allowing you to drill down your search to increasingly more specific sectors. Simply find the SIC code you think matches your target sector most closely and you can search for all possible matches.
df['SICCode.SicText_1'].value_counts()
82990 - Other business support service activities n.e.c. 245210
70229 - Management consultancy activities other than financial management 205617
None Supplied 176707
68209 - Other letting and operating of own or leased real estate 163242
96090 - Other service activities n.e.c. 147043
...
7121 - Rent other land transport equipment 1
5181 - Wholesale of machine tools 1
1320 - Mining non-ferrous metal ores 1
1422 - Mining of clays and kaolin 1
3350 - Manufacture of watches and clocks 1
Name: SICCode.SicText_1, Length: 1177, dtype: int64
df['SICCode.SicText_2'].value_counts()
68209 - Other letting and operating of own or leased real estate 53167
70229 - Management consultancy activities other than financial management 23765
62020 - Information technology consultancy activities 18598
41202 - Construction of domestic buildings 16992
82990 - Other business support service activities n.e.c. 16266
...
7413 - Market research, opinion polling 1
05200 - Mining of lignite 1
1583 - Manufacture of sugar 1
2221 - Printing of newspapers 1
7482 - Packaging activities 1
Name: SICCode.SicText_2, Length: 1012, dtype: int64
df['SICCode.SicText_3'].value_counts()
68320 - Management of real estate on a fee or contract basis 13878
68209 - Other letting and operating of own or leased real estate 10902
70229 - Management consultancy activities other than financial management 8608
62090 - Other information technology service activities 7347
82990 - Other business support service activities n.e.c. 6874
...
24460 - Processing of nuclear fuel 1
2416 - Manufacture of plastics in primary forms 1
2874 - Manufacture fasteners, screw, chains etc. 1
9302 - Hairdressing & other beauty treatment 1
13940 - Manufacture of cordage, rope, twine and netting 1
Name: SICCode.SicText_3, Length: 871, dtype: int64
df['SICCode.SicText_4'].value_counts()
68320 - Management of real estate on a fee or contract basis 8272
96090 - Other service activities n.e.c. 4314
82990 - Other business support service activities n.e.c. 3959
70229 - Management consultancy activities other than financial management 3866
85600 - Educational support services 2747
...
13922 - manufacture of canvas goods, sacks, etc. 1
2225 - Ancillary printing operations 1
5020 - Maintenance & repair of motors 1
7012 - Buying & sell own real estate 1
9212 - Motion picture & video distribution 1
Name: SICCode.SicText_4, Length: 719, dtype: int64
Next, we’ll create a helper function to search the Companies House data. The major caveat here is that the data aren’t that clean. If you restrict your search to a county, country, or other parameter, you may find it misses companies, as some records are incomplete, or inconsistent, so check first.
def get_matches(df, sic_code, county=None):
"""Return a dataframe of active SIC code matches.
Args:
df (dataframe): Pandas dataframe containing Companies House data
sic_code (string): SIC code to find, i.e. 68320
county (string, optional): Optional county (default = None)
"""
if county:
return df[ (df['CompanyStatus']=='Active') \
& (df['SICCode.SicText_1'].str.startswith(sic_code + ' -')) \
& (df['RegAddress.County']==str.upper(county))
]
else:
return df[ (df['CompanyStatus']=='Active') & (df['SICCode.SicText_1'].str.startswith(sic_code + ' -') )]
To find all the active estate agents in the UK, all we need to do now is pass the df
and the SIC code to get_matches()
and we get back a Pandas dataframe containing our results. Running .shape
on this shows that we get back 159,188 companies currently marked active in Companies House.
df_estate_agents = get_matches(df, sic_code='68209', )
df_estate_agents.head()
CompanyName | CompanyNumber | RegAddress.CareOf | RegAddress.POBox | RegAddress.AddressLine1 | RegAddress.AddressLine2 | RegAddress.PostTown | RegAddress.County | RegAddress.Country | RegAddress.PostCode | CompanyCategory | CompanyStatus | CountryOfOrigin | DissolutionDate | IncorporationDate | Accounts.AccountRefDay | Accounts.AccountRefMonth | Accounts.NextDueDate | Accounts.LastMadeUpDate | Accounts.AccountCategory | Returns.NextDueDate | Returns.LastMadeUpDate | Mortgages.NumMortCharges | Mortgages.NumMortOutstanding | Mortgages.NumMortPartSatisfied | Mortgages.NumMortSatisfied | SICCode.SicText_1 | SICCode.SicText_2 | SICCode.SicText_3 | SICCode.SicText_4 | LimitedPartnerships.NumGenPartners | LimitedPartnerships.NumLimPartners | URI | PreviousName_1.CONDATE | PreviousName_1.CompanyName | PreviousName_2.CONDATE | PreviousName_2.CompanyName | PreviousName_3.CONDATE | PreviousName_3.CompanyName | PreviousName_4.CONDATE | PreviousName_4.CompanyName | PreviousName_5.CONDATE | PreviousName_5.CompanyName | PreviousName_6.CONDATE | PreviousName_6.CompanyName | PreviousName_7.CONDATE | PreviousName_7.CompanyName | PreviousName_8.CONDATE | PreviousName_8.CompanyName | PreviousName_9.CONDATE | PreviousName_9.CompanyName | PreviousName_10.CONDATE | PreviousName_10.CompanyName | ConfStmtNextDueDate | ConfStmtLastMadeUpDate | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
30 | CDS PROPERTIES (NORTHERN) LIMITED | SC537855 | NaN | NaN | C/O PROPERTY ACCOUNTS LIMITED | 27 LAURISTON STREET | EDINBURGH | NaN | NaN | EH3 9DQ | Private Limited Company | Active | United Kingdom | NaN | 14/06/2016 | 30.0 | 6.0 | 30/06/2021 | 30/06/2019 | NaN | 12/07/2017 | NaN | 11 | 9 | 0 | 2 | 68209 - Other letting and operating of own or ... | NaN | NaN | NaN | 0 | 0 | http://business.data.gov.uk/id/company/SC537855 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 07/06/2021 | 24/05/2020 |
61 | CDS TRADING UK LIMITED | 04636487 | NaN | NaN | 8 HEMMELLS | NaN | BASILDON | ESSEX | ENGLAND | SS15 6ED | Private Limited Company | Active | United Kingdom | NaN | 14/01/2003 | 31.0 | 3.0 | 31/03/2021 | 31/03/2019 | TOTAL EXEMPTION FULL | 11/02/2017 | 14/01/2016 | 3 | 2 | 0 | 1 | 68209 - Other letting and operating of own or ... | NaN | NaN | NaN | 0 | 0 | http://business.data.gov.uk/id/company/04636487 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 25/02/2021 | 14/01/2020 |
74 | CDSB LIMITED | 12916865 | NaN | NaN | 16 EURO BUSINESS PARK | NEW ROAD | NEWHAVEN | EAST SUSSEX | UNITED KINGDOM | BN9 0DQ | Private Limited Company | Active | United Kingdom | NaN | 30/09/2020 | 30.0 | 9.0 | 30/06/2022 | NaN | NO ACCOUNTS FILED | 28/10/2021 | NaN | 0 | 0 | 0 | 0 | 68209 - Other letting and operating of own or ... | NaN | NaN | NaN | 0 | 0 | http://business.data.gov.uk/id/company/12916865 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 13/10/2021 | NaN |
146 | CDT PROPERTY INVESTMENTS LIMITED | 10695269 | NaN | NaN | SUITE 1 SECOND FLOOR EVERDENE HOUSE | DEANSLEIGH ROAD | BOURNEMOUTH | NaN | UNITED KINGDOM | BH7 7DU | Private Limited Company | Active | United Kingdom | NaN | 28/03/2017 | 31.0 | 3.0 | 31/12/2021 | 31/03/2020 | TOTAL EXEMPTION FULL | 25/04/2018 | NaN | 2 | 2 | 0 | 0 | 68209 - Other letting and operating of own or ... | NaN | NaN | NaN | 0 | 0 | http://business.data.gov.uk/id/company/10695269 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 10/04/2021 | 27/03/2020 |
200 | CDVRE HOLDINGS LTD | 09232551 | NaN | NaN | 8B ACCOMMODATION ROAD | GOLDERS GREEN | LONDON | NaN | NaN | NW11 8ED | Private Limited Company | Active | United Kingdom | NaN | 24/09/2014 | 30.0 | 9.0 | 30/06/2022 | 30/09/2020 | UNAUDITED ABRIDGED | 22/10/2017 | 24/06/2016 | 3 | 3 | 0 | 0 | 68209 - Other letting and operating of own or ... | NaN | NaN | NaN | 0 | 0 | http://business.data.gov.uk/id/company/09232551 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 27/09/2021 | 13/09/2020 |
df_estate_agents.shape
(159188, 55)
To find only the active estate agents in Norfolk, we can pass in optional county
argument. Running this gives us 834 estate agents currently active in Norfolk. And that’s it.
You can, of course, come up with your own funky filters or regexes to find businesses that haven’t been categorised properly, but this is surprisingly effective for free! Note that it may not give you permission to market to them, but this may fall under legitimate interest GDPR wise.
df_estate_agents_norfolk = get_matches(df, sic_code='68209', county='Norfolk')
df_estate_agents_norfolk.head()
CompanyName | CompanyNumber | RegAddress.CareOf | RegAddress.POBox | RegAddress.AddressLine1 | RegAddress.AddressLine2 | RegAddress.PostTown | RegAddress.County | RegAddress.Country | RegAddress.PostCode | CompanyCategory | CompanyStatus | CountryOfOrigin | DissolutionDate | IncorporationDate | Accounts.AccountRefDay | Accounts.AccountRefMonth | Accounts.NextDueDate | Accounts.LastMadeUpDate | Accounts.AccountCategory | Returns.NextDueDate | Returns.LastMadeUpDate | Mortgages.NumMortCharges | Mortgages.NumMortOutstanding | Mortgages.NumMortPartSatisfied | Mortgages.NumMortSatisfied | SICCode.SicText_1 | SICCode.SicText_2 | SICCode.SicText_3 | SICCode.SicText_4 | LimitedPartnerships.NumGenPartners | LimitedPartnerships.NumLimPartners | URI | PreviousName_1.CONDATE | PreviousName_1.CompanyName | PreviousName_2.CONDATE | PreviousName_2.CompanyName | PreviousName_3.CONDATE | PreviousName_3.CompanyName | PreviousName_4.CONDATE | PreviousName_4.CompanyName | PreviousName_5.CONDATE | PreviousName_5.CompanyName | PreviousName_6.CONDATE | PreviousName_6.CompanyName | PreviousName_7.CONDATE | PreviousName_7.CompanyName | PreviousName_8.CONDATE | PreviousName_8.CompanyName | PreviousName_9.CONDATE | PreviousName_9.CompanyName | PreviousName_10.CONDATE | PreviousName_10.CompanyName | ConfStmtNextDueDate | ConfStmtLastMadeUpDate | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
25564 | CHAND PROPERTIES LIMITED | 09622267 | NaN | NaN | THE OFFICE, 4A ALLENDALE ROAD | CAISTER-ON-SEA | GREAT YARMOUTH | NORFOLK | ENGLAND | NR30 5ES | Private Limited Company | Active | United Kingdom | NaN | 03/06/2015 | 30.0 | 6.0 | 30/06/2021 | 30/06/2019 | NaN | 01/07/2017 | 03/06/2016 | 0 | 0 | 0 | 0 | 68209 - Other letting and operating of own or ... | NaN | NaN | NaN | 0 | 0 | http://business.data.gov.uk/id/company/09622267 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 17/06/2021 | 03/06/2020 |
29464 | CHAPMANS,NORWICH.LIMITED | 00294792 | NaN | NaN | UNIT 1, BENNETT HOUSE | NORWICH STREET | DEREHAM | NORFOLK | NaN | NR19 1BX | Private Limited Company | Active | United Kingdom | NaN | 07/12/1934 | 31.0 | 10.0 | 31/07/2021 | 31/10/2019 | TOTAL EXEMPTION FULL | 27/09/2016 | 30/08/2015 | 1 | 0 | 0 | 1 | 68209 - Other letting and operating of own or ... | NaN | NaN | NaN | 0 | 0 | http://business.data.gov.uk/id/company/00294792 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 13/09/2021 | 30/08/2020 |
35229 | CHARNDATE LIMITED | 05176236 | NaN | NaN | YEW TREE FARM | PULHAM MARKET | DISS | NORFOLK | NaN | IP21 4XN | Private Limited Company | Active | United Kingdom | NaN | 12/07/2004 | 31.0 | 3.0 | 31/12/2021 | 31/03/2020 | TOTAL EXEMPTION FULL | 09/08/2016 | 12/07/2015 | 0 | 0 | 0 | 0 | 68209 - Other letting and operating of own or ... | NaN | NaN | NaN | 0 | 0 | http://business.data.gov.uk/id/company/05176236 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 21/07/2021 | 07/07/2020 |
40120 | CHEALL & COE PROPERTIES LTD | 04956854 | NaN | NaN | 58B SHAKESPEARE WAY | TAVERHAM | NORWICH | NORFOLK | NaN | NR8 6SL | Private Limited Company | Active | United Kingdom | NaN | 07/11/2003 | 31.0 | 12.0 | 30/09/2021 | 31/12/2019 | TOTAL EXEMPTION FULL | 05/12/2016 | 07/11/2015 | 2 | 2 | 0 | 0 | 68209 - Other letting and operating of own or ... | NaN | NaN | NaN | 0 | 0 | http://business.data.gov.uk/id/company/04956854 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 20/12/2020 | 08/11/2019 |
43431 | CHELSEA INVESTMENT PROPERTIES LTD | 11646725 | NaN | NaN | WATERLOO HOUSE | 17 WATERLOO ROAD | NORWICH | NORFOLK | UNITED KINGDOM | NR3 1EH | Private Limited Company | Active | United Kingdom | NaN | 29/10/2018 | 31.0 | 1.0 | 31/10/2021 | 31/01/2020 | TOTAL EXEMPTION FULL | 26/11/2019 | NaN | 11 | 11 | 0 | 0 | 68209 - Other letting and operating of own or ... | NaN | NaN | NaN | 0 | 0 | http://business.data.gov.uk/id/company/11646725 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 11/11/2021 | 28/10/2020 |
df_estate_agents_norfolk.shape
(834, 55)
The logical progression to this project is to use lookalike modeling. To do this, you’d first need to augment your customer data by adding in the SIC code, then examine the differences in customer performance and value by SIC code, and then use the dataset to identify more customers who look like your best ones…
While not particularly difficult, it does require some complicated fuzzy matching to allow you to connect the SIC codes of your customers (which may not be ordering via their head office address) with the addresses held in the Companies House dataset.
Matt Clarke, Saturday, March 13, 2021