When working with Python lists you’ll often encounter times when you need to remove duplicate values present in a single list, remove duplicates found in multiple lists, or identify the...
When working with customer data or upon time series data science projects, you’ll often find the need to calculate the difference between two dates in your MySQL queries. The MySQL...
When working with customer data in ecommerce it’s very common for data scientists to need to add and subtract from date values directly within MySQL queries. For example, you might...
The SQL CASE statement is used for flow control, much like an if, then, else statement. If the statement finds a match on the chosen condition it will return the...
Most SQL databases store dates in the datetime format. This is really useful because MySQL, and other SQL dialects, make it very easy to convert datetime objects to a wide...
MySQL includes a large number of string functions and operators that you can use in SQL statements to both query data and reformat column values. In this simple example, I’ll...
When you create an SQL SELECT statement, the data probably won’t be returned or sorted in the order you want, so the ORDER BY clause is used to control the...
The SQL GROUP BY clause groups row-based data into aggregated data, reducing the number of rows in the dataset, and is commonly used to perform aggregate calculations.
In SQL, when you want to SELECT data that lies between two values, there are a number of different SQL operators you can use to return the correct data. However,...
The SELECT statement is the most simple of all SQL queries and allows you to retrieve the precise data you want from one or more tables, or even databases. In...
SQL is one of the most widely used languages in data science so it’s important to know at least the basics required in order to fetch the data you’ll need...
The QR code or Quick Response code is a two-dimensional or matrix barcode invented in the early nineties by a Japanese car manufacturer. QR codes started to become popular in...
When creating business reports or running queries against a database or web analytics platform in a business setting, you’ll often need to know the start and end dates of the...
In ecommerce and marketing it’s relatively common to use ISO week numbers when reporting data. The ISO week system is a leap week calendar that forms part of the ISO...
Alongside the Python list, the dictionary is the most commonly used data storage structure in Python. Dictionaries allow you to store numeric and text-based data as a series of key-value...
Average Order Value or AOV is one of the most critical ecommerce metrics. Along with your sessions and conversion rate, it ultimately controls how much revenue an ecommerce business generates....
The requests HTTP library for Python allows you to make HTTP requests to servers and receive back HTTP status codes, site content, and other data. It’s extremely useful for building...
To extract specific data from the Google Analytics API you will often need to use segments and filters to ensure you get the data you want. For example, you might...
Google Analytics provides a useful Shopping Behaviour Analysis report that lets you examine the volumes of users who are performing important actions on your ecommerce website, such as viewing products,...
Exceptions are events that can modify the flow of control through a Python application and are triggered when errors occur. When writing production code it’s a good idea to both...
Pip is a command line application that allows you to install, upgrade, and remove Python packages from your development environment using simple commands. It works just like the Aptitude or...
Calculating the time difference between two dates in Pandas can yield useful information to aid your analysis, help you understand the data, and guide a machine learning model on making...
Dropbox is one of the most widely used file storage and file sharing platforms and is used by a wide variety of businesses. Dropbox has put the effort into building...
While Mailchimp is best known for its email marketing functionality, it also includes an excellent transactional email API designed to allow developers to send transactional emails from their applications.
XML feeds are a data format that uses Extensible Markup Language to provide structured data that can be read by search engines and online advertising providers. For example, a Google...
The Pandas apply() function allows you to run custom functions on the values in a Series or column of your Pandas dataframe. The Pandas apply function can be used for...
The Pandas describe() function generates descriptive statistics on the contents of a Pandas dataframe to show the central tendency, shape, distribution, and dispersion of variables. Examining descriptive statistics is the...
Web scraping is a programming technique that uses a script or bot to visit one or more websites and extract specific elements or HTML tags from the source code of...
There are two main ways you can grow the customer base of a business: you can either acquire more customers by increasing your customer acquisition rate, or you can improve...
Most off-the-shelf SEO tools come with a rank checker that allows you to monitor your position for given phrases in the Google search engine results. If you want to create...
Most ecommerce websites use review platforms, such as Feefo, Trustpilot, and Google Reviews, to allow customers to give feedback on their service and the products they sell. The reviews help...
One common task you’ll perform in Google Search Console is to compare the data from two different time periods to see how impressions, clicks, click-through rate (CTR), or average position...
The RFM model is probably one of the best known and most widely used customer segmentation models by data driven marketers. It’s used for both measuring customer value and predicting...
One of the problems with search engine optimisation or SEO is that search engine algorithms are essentially black boxes. They analyse so many on-page and off-page factors, and use multiple...
After work, when I’m not learning about data science, practising data science, or writing about data science, I like to browse classic car auction sites looking for cars I can’t...
Customer segmentation, and the similar and related field of market segmentation, are particularly relevant to the field of business-to-business (B2B) e-commerce. B2B customers often have a higher Customer Lifetime Value...
There are some great anomaly detection models available for Python, which let you examine complex data for a wide range of different anomaly types. In this project, I’ll show you...
One of the most useful Python SEO projects you can undertake is to identify the top keywords for which each of your site’s pages are ranking for. Sometimes, these keywords...
If you regularly work with time series data, one common thing you’ll need to do is add and subtract days from a date. If you tried doing this by hand,...
The demographics and interests data provided in Google Analytics can be a useful way to understand who is visiting your site or purchasing your products, without the need to perform...
Striking distance keywords are those which appear just off the bottom of the first page of search engine results. Keywords that appear on the first page have the greatest visibility...
Lead scoring is a Customer Relationship Management (CRM) process that involves segmenting CRM contacts based on their likelihood to make a purchase. Lead scoring is applied to both existing customers...
Mailchimp is one of the most widely used email service providers (or ESPs) in ecommerce and marketing. Since it is popular with those who only need its basic campaign features,...
Google Search Console is a really useful tool for search marketers since it shows what is happening data-wise before organic search visitors reach your website. Google Analytics only shows you...
In ecommerce, email marketing remains one of the most effective (and cost-effective) digital marketing techniques, especially when combined with data science techniques. The vast amounts of customer data generated in...
The eBay Finding API gives you direct access to eBay search listings using a simple SDK. This API lets you search or query eBay to fetch specific search listings for...
The Zendesk customer service platform is widely used by ecommerce businesses, but its functionality for analysing ticket trends and automatically classifying them is somewhat limited. In many cases, you might...
The Google Search Console (GSC) API is a great source of information for those working in SEO, marketing, or ecommerce. It can tell you which of your pages are appearing...
GSpread is a Python package that makes it quick and easy to read and write data from Google Sheets spreadsheets stored in your Google Drive into Python. With a tiny...
The Lin Rodnitzky Ratio is a calculation designed to help search engine marketers assess the management of paid search campaigns and account structure. When managing paid search advertising accounts you...
Subscription commerce was all the rage for a while, but it’s not really become as popular as many in ecommerce perhaps envisaged. While we may have subscriptions for certain things,...
If you want to change careers and move into the data science or data mining field, as either a data scientist or a data engineer, or simply improve your skills,...
Customer segmentation is the process of using data science techniques to create discrete groups of customers which share common characteristics or attributes. For example, a company might segment customers into...
RSS feeds have been a mainstay on the web for over 20 years now. These XML-based documents are generated by web servers and designed to be read in RSS feed...
Although I have never really considered myself a technical SEO, I do need to do quite a bit of SEO work as part of my role as an Ecommerce Director....
Internal linking helps improve the user experience by recommending related content to users, which both reduces bounce rate, and helps search engine optimisation efforts. While there are no hard and...
EcommerceTools makes it really quick and easy to scrape Google search engine results in Python. In this simple project, we’ll use EcommerceTools to search Google for your chosen keywords, use...
Product recommender systems, or recommendation systems, as they’re also known are ubiquitous on e-commerce websites these days. They’re relatively simple to create and even fairly basic ones can give striking...
Major internet retailers, like Walmart and Amazon, have been at the forefront of ecommerce data science and data analytics for many years, contributing lots of interesting papers to data science...
While reporting is often quite a useful way to stay on top of your data, it’s also something you can automate to save time, even if your reports include custom...
Like most people who work in ecommerce and marketing, I spend a lot of time in Google Analytics. It’s a great tool, but when reporting on the numbers, it helps...
Customer segmentation can give you huge insights into your business and identify a whole range of different things about your customers, allowing you to change your marketing and improve results....
There’s often a lot of faffing around required to get marketing and ecommerce data from various systems into Pandas so you can analyse it, or use it within more complex...
While the Recency, Frequency, Monetary value or RFM model for customer segmentation might be old, it’s based on sound science, so no matter what customer model you’re building, it’s generally...
Cohort analysis is unlike most other customer segmentation techniques in that it typically uses a time-based element. It’s typically used to segment customers into groups, or cohorts, based on their...
Whether you’re analysing content written in other languages using Natural Language Processing, or you want to assist your content team by translating their writing into other languages, machine translating software...
In those ecommerce businesses where relatively few products are launched and products have a relatively long lifecycle, copywriters tend to be targeted on producing unique content that sells the benefits...
In ecommerce, price monitoring is a really important consideration. If you offer your products at a price which is too high within the market, you may lose sales to rivals,...
Calculating Customer Lifetime Value or CLV is considered a really important thing in marketing and ecommerce, yet most companies can’t do it properly. This clever metric tells you the predicted...
RFM segmentation is one of the oldest and most effective ways to segment customers. RFM models are based on three simple values - recency, frequency, and monetary value - which...
You can learn many things about your products from the purchase behaviours and product consumption and product replenishment of your customers. Some items are purchased individually, some items are purchased...
If you’ve never heard of Spintax, you’ll definitely be aware of its typical usages. This text string replacement “language” is most commonly used for text spinning or article spinning, but...
As I’ve mentioned in previous posts on web scraping, the most efficient way to scrape data is to identify what Schema.org metadata is in use and then create a microdata...
People Also Ask or PAA boxes have been becoming increasingly common in Google’s search results over the past few years. They show a range of questions and answers related to...
Although I suspect you are probably not technically allowed to do it, I doubt there’s an SEO in the land who hasn’t scraped Google search engine results to analyse them,...
Neither Google Search Console nor Google Analytics gives you access to the data found in both systems in one place. However, with a bit of ingenuity and some relatively simple...
The Google Autocomplete feature, or Google Suggest as it was previously known, has become a part of everyday life for us all. Start typing a search term into Google, and...
Purchase latency or customer latency is a measure of the number of days between a customer’s orders and is one of the most powerful features in many propensity and churn...
As I explained in my previous post, many B2B ecommerce businesses spend huge amounts on procuring third-party data for companies they wish to target. However, with some data science skills...
According to the Harvard Business Review, the role of data scientist is said to be “the sexiest job of the 21st century”. Data science and data engineering skills are said...
In B2B ecommerce, there are two main approaches to new customer acquisition: you either rely on your website to acquire customers for you, or you target specific customers through sales...
One quick and easy way to understand the size of a website, and its growth rate, is to examine the number of its web pages Google has indexed. You can...
Although the techniques for reducing its impact have existed for decades, inventory management is still a huge issue in many businesses. Various things happen that can result in costly stock...
Successful operations management is crucial to the overall growth of an ecommerce business. While those in ecommerce, marketing, or data science, can work together the sales coming in and encourage...
Marketers can be just as obsessive about data as data scientists, so there are an abundance of well-researched marketing metrics available for analysing marketing performance. Most of the commonly used...
Customers are expensive to acquire but generate more and more profit as time goes on. Providing you nurture them, treat them kindly, and apologise and fix any mistakes that occur,...
Category management is a retail technique that breaks down a company’s product range into groups of related items, such as categories, or subcategories, or by their product type. By running...
The Google Knowledge Graph database includes an astronomical amount of data on almost every topic you can think of, allowing Google to create Knowledge Panels and infoboxes that summarise search...
Catalogue marketing is dying out. Over the past few years, virtually all the UK’s top catalogue retailers have stopped printing on paper and successfully transitioned their businesses online, either to...
The downside to building datasets using web scraping is that every site has custom HTML. If you scrape sites in this way, you’ll forever be building bespoke scrapers, and they’ll...
Most very large datasets tend to get compressed on servers to preserve storage space and bandwidth and allow them to be downloaded more quickly by end users. Python includes some...
If you regularly work with ecommerce data, you’re likely to have encountered PHP serialized arrays or objects. Serialization is a process used to take a complex data structure, such as...
The Google Analytics Measurement Protocol API lets you add data to your GA account that hasn’t been triggered by a user visiting a web page. Since it’s so flexible, you...
Many websites include Open Graph protocol data in their document head. This structured data allows social networks, such as Facebook and Twitter, to access specific elements of the page’s content...
When scraping websites, and when checking how well a site is configured for crawling, it pays to carefully check and parse the site’s robots.txt file. This file, which should be...
Scraping the titles and meta descriptions from every page on a site can tell you a great deal about its content, the underlying content strategy, or product ranges, and many...
Both 404 page not found errors and 301 redirect chains can be costly and damaging to the performance of a website. They’re both easy to introduce, especially on ecommerce sites...
Large, uncompressed images slow down your site, increase bandwidth costs, harm the user experience, and impact search engine rankings. In this project, I’ll show you how you can bulk resize...
XML sitemaps are designed to make life easier for search engines by providing an index of a site’s URLs. However, they’re also a useful tool in competitor analysis and allow...
URLs often contain useful information that can be used to analyse a website, a user’s search, or the breakdown of content present in each section. While they often look pretty...
Keyword cannibalisation occurs when you have several pages ranking for the same phrase, effectively putting them in competition with each other for search engine rankings. Since Google generally only shows...
In many data science projects you may need to download remote data, such as images, CSV files, or compressed data. Python makes it fairly straightforward to download files within your...
The Economic Order Quantity or EOQ represents the optimum purchase quantity for a given product, while aiming to minimise holding costs, shortage costs, and order costs. It’s most commonly calculated...
Unless you’re building a large and complex web scraper using Scrapy or Selenium, it’s probable that you’ll utilise Requests and Beautiful Soup. These two packages are brilliant for web scraping....
Back in 2020, Google introduced Web Vitals, a set of metrics is designed to help site owners to optimise the user experience on their websites, so pages are quick to...
If, like me, you’ve come from a background where you made heavy use of SQL, then getting to grips with filtering, subsetting, and selecting data in Pandas can be a...
If your site’s pages aren’t indexed by Google, you’re obviously not going to generate any traffic to them, so you’ll want to check that everything you expect to be present...
Google Search Console contains loads of really useful information for technical SEO. However, there are limits to what you can do using the front-end interface, and it takes time to...
The Screaming Frog SEO Spider Tool is widely used in digital marketing and ecommerce. It provides a user-friendly interface to a powerful site crawler and scraper that can be used...
Slack is a great tool for data scientists and data engineers and is now being adopted across businesses, so it’s probable that you already use it in your workplace. Besides...
In the field sales sector, one common thing you’ll want to do is identify all the potential clients you have within a particular region, so you can assign your team...
Setting up keywords for new paid search accounts can be a repetitive and time-consuming process. While it’s historically been done using Excel, many digital marketers are now taking advantage of...
Web scraping is a really useful skill in data science. We obviously need data for our models and analyses, but it’s not always easily available, so building our own datasets...
DRY, or Don’t Repeat Yourself, and the “Do One Thing” methodology are designed to help software engineers and data scientists create better functions. Code that isn’t written using DRY tends...
Charts and plots can often look a bit stale and professional, which might not be appropriate in every setting. If you want to dumb-down your charts and make them look...
Funnels are arguably one of the most powerful data visualisations you can use within the ecommerce field. At a glance, they can show you the proportion of customers entering at...
The flexibility of programming languages like Python means that any code you write to tackle a given problem will differ in approach and style to code written by someone else....
SQLite is a relational database management system (RDBMS) that is easy to access within Python and other languages. Unlike MySQL, PostgreSQL, and other databases, SQLite uses a serverless design, so...
For data scientists, Python operators are one of the most powerful and widely used features of this language. These special symbols or characters tell Python to perform some sort of...
Lists are one of the most widely used data storage objects or data types within Python and are used throughout every data science package. Along with the dictionary, tuple, and...
Git is the world’s most widely used version control system and is an essential tool for data scientists, especially those collaborating on projects with others. You’ll need to be able...
Docstrings are comment blocks that are added to the top of Python functions to explain the purpose of the function, describe the arguments that it accepts, and explain what the...
The Pandas value_counts() function can be used to count the number of times a value occurs within a dataframe column or series, as well as calculating frequency distributions. Here’s a...
For years, I used to spend much of my time performing Exploratory Data Analysis directly in SQL. Over time, the queries I wrote became very complicated, and it was often...
While data scientists may do nearly everything in Pandas, we also need to perform file operations in regular Python and in applications not tied to dataframes. Thankfully, Python makes it...
There are hundreds of excellent Python data science libraries and packages that you’ll encounter when working on data science projects. However, there are four of them that you’ll probably use...
Word clouds (also known as tag clouds, wordles, or weighted lists) have been around since the mid nineties and are one of the most effective data visualisations for representing the...
One of the key steps in the Exploratory Data Analysis process that comes before model development is to understand the statistical distribution of the variables or features within the data...
The Venn diagram is one of the most intuitive data visualisations for showing the overlap between two or three groups, or “sets”, of data. These diagrams were created in the...
Line charts, line graphs, or line plots are among the most widely used data visualisations. They’re ideal for time series data in which you’re plot a metric on the y...
Barplots or bar charts are probably the most widely used visualisation for displaying and comparing categorical variables. They’re very easy to understand and are quick and easy to generate.
Pearson’s product-moment correlation, or Pearson’s R, is a statistical method commonly used in data science to measure the strength of the linear relationship between variables. If you can identify existing...
Categorical data can be visualised in many ways, and there’s no requirement to stick to the standard bar chart. Here are a selection of attractive Seaborn charts, graphs, and plots...
One of the most annoying aspects of working with GPU-accelerated data science software, such as NVIDIA Rapids, TensorFlow, PyTorch and XGBoost, is that it can sometimes be very complicated and...
There are numerous websites I use for my work that don’t have dedicated desktop applications designed for Ubuntu Linux, such as GitHub, GitHub Gists, GitLab and Jira. However, it’s now...
If you’re working in data science, and especially if you’re working in deep learning, you’re going to need a decent workstation in order to be productive. Earlier this year I...
Heatmaps are one of the most intuitive ways to display data across two dimensions, and they work particularly well on temporal data, such as web analytics metrics. They’re a great...
Recent papers on the Recency, Frequency, Monetary or RFM model, such as the one by Inanc Kabasakal in 2020, have started to adopt text-based labels to help people understand the...
Scatterplots, scatter graphs, scatter charts, or scattergrams, are one of the most popular mathematical plots and represent one of the best ways to visualise the relationship of data on two...
During the Exploratory Data Analysis or EDA stage one of the key things you’ll want to do is understand the statistical distribution of your data. Histograms are one of the...
The boxplot, or box-and-whisker diagram, is one of the most useful ways to visualise statistical distributions in data. While they can seem a bit unintuitive when you first look at...
Selecting, filtering and subsetting data is probably the most common task you’ll undertake if you work with data. It allows you to extract subsets of data where row or column...
When working with time series data, such as web analytics data or ecommerce sales, the time series format in your dataset might not be ideal for the analysis you’re performing...
If you regularly work with time series data in Pandas it’s probable that you’ll sometimes need to convert dates or datetimes and extract additional features from them.
Pandas allows you to import data from a wide range of data sources directly into a dataframe. These can be static files, such as CSV, TSV, fixed width files, Microsoft...
Transactional item data can be used to create a number of other useful datasets to help you analyse ecommerce products and customers. From the core list of items purchased you...
The things we search for online can reveal a remarkable amount about us, even when viewed in aggregate on an anonymous level. For many years, Google has made some of...
Image hashing (or image fingerprinting) is a technique that is used to convert an image to an alphanumeric string. While this might sound somewhat pointless, it actually has a number...
As everyone who works in ecommerce will know, stock-outs on your key lines can have a massive negative impact on sales and your marketing costs. In many cases, you’ll be...
ABC inventory classification has been one of the most widely used methods of stock control in operations management for decades. It’s an intentionally simple system in which products are assigned...
Many MySQL databases are configured to accept connections from other servers on the local network and will reject connections from remote machines. Ordinarily, you could work around this by creating...
The Google Analytics add-on for Google Sheets allows you to use the Google Analytics reporting API to create custom weekly reports and schedule them to run. However, to run a...
In both B2C and B2B ecommerce, special trading periods such as Christmas, Mothers’ Day, and Valentines’ Day can often greatly contribute to sales. Indeed, the introduction of Black Friday sales...
The Precision 7000 series is the top of the range mobile workstation laptop from Dell and is aimed firmly at professional users who are doing high-end GPU-accelerated work, whether that’s...
When you gain access to a new dataset, chances are, it’s probably not in the format you require for analysis or modeling. The most common problem you’ll encounter is datasets...
Market Basket Analysis, or MBA, is a subset of affinity analysis and has been used in the retail sector for many years. It provides a computational method for identifying common...
In the ecommerce sector, you can learn a lot about your competitors and the expectations of your customers by analysing the reviews their customers leave for products and service on...
In ecommerce, it pays to watch what your competitors are doing, so over the past decade or so in which I’ve managed ecommerce businesses, I’ve regularly undertaken competitor analyses. They’re...
The massive versatility of Pandas means that you can create dataframes from almost any type of raw data. Whether you have a list, a list of lists, a dictionary, a...
Recommender systems, or recommendation engines as they’re also known, are everywhere these days. Whether you’re looking for books on Amazon, tracks on Spotify, movies on Netflix or a date on...
Over the past decade I’ve written more Google Analytics API queries than I can remember. Initially, I favoured PHP for these (and still do for permanent web-based applications utilising GA...
The scikit-learn package comes with a range of small built-in toy datasets that are ideal for using in test projects and applications. As they’re part of the scikit-learn package, you...
Regular expressions are used for pattern matching in programming, allowing you to identify or extract very specific pieces of text from a string or document. They’re very powerful and extremely...
When dealing with temporal or time series data, the dates themselves often yield information that can vastly improve the performance of your model. However, to get the best from these...
When you use pip to install Python packages from The Python Package Index (PyPi) they get stored in your site-packages directory and are used across your system whenever you run...