Pandas

Articles tagged Pandas

How to run time-based SEO tests using Python

One of the problems with search engine optimisation or SEO is that search engine algorithms are essentially black boxes. They analyse so many on-page and off-page factors, and use multiple...

How to identify SEO keyword opportunities with Python

One of the most useful Python SEO projects you can undertake is to identify the top keywords for which each of your site’s pages are ranking for. Sometimes, these keywords...

How to add days and subtract days from dates in Pandas

If you regularly work with time series data, one common thing you’ll need to do is add and subtract days from a date. If you tried doing this by hand,...

How to identify striking distance keywords with Python

Striking distance keywords are those which appear just off the bottom of the first page of search engine results. Keywords that appear on the first page have the greatest visibility...

How to create monthly Google Search Console API reports with EcommerceTools

Google Search Console is a really useful tool for search marketers since it shows what is happening data-wise before organic search visitors reach your website. Google Analytics only shows you...

How to use the Mailchimp Marketing Python API with Pandas

In ecommerce, email marketing remains one of the most effective (and cost-effective) digital marketing techniques, especially when combined with data science techniques. The vast amounts of customer data generated in...

How to use the eBay Finding API with Python

The eBay Finding API gives you direct access to eBay search listings using a simple SDK. This API lets you search or query eBay to fetch specific search listings for...

How to export Zendesk tickets into Pandas using Zenpy

The Zendesk customer service platform is widely used by ecommerce businesses, but its functionality for analysing ticket trends and automatically classifying them is somewhat limited. In many cases, you might...

How to query the Google Search Console API with EcommerceTools

The Google Search Console (GSC) API is a great source of information for those working in SEO, marketing, or ecommerce. It can tell you which of your pages are appearing...

How to read Google Sheets data in Pandas with GSpread

GSpread is a Python package that makes it quick and easy to read and write data from Google Sheets spreadsheets stored in your Google Drive into Python. With a tiny...

How to calculate the Lin Rodnitzky Ratio using GAPandas

The Lin Rodnitzky Ratio is a calculation designed to help search engine marketers assess the management of paid search campaigns and account structure. When managing paid search advertising accounts you...

How to analyse product replenishment

Subscription commerce was all the rage for a while, but it’s not really become as popular as many in ecommerce perhaps envisaged. While we may have subscriptions for certain things,...

Data science courses for budding data scientists and data engineers

If you want to change careers and move into the data science or data mining field, as either a data scientist or a data engineer, or simply improve your skills,...

How to read an RSS feed in Python

RSS feeds have been a mainstay on the web for over 20 years now. These XML-based documents are generated by web servers and designed to be read in RSS feed...

19 Python SEO projects that will improve your site

Although I have never really considered myself a technical SEO, I do need to do quite a bit of SEO work as part of my role as an Ecommerce Director....

How to identify internal and external links using Python

Internal linking helps improve the user experience by recommending related content to users, which both reduces bounce rate, and helps search engine optimisation efforts. While there are no hard and...

How to scrape Google results in three lines of Python code

EcommerceTools makes it really quick and easy to scrape Google search engine results in Python. In this simple project, we’ll use EcommerceTools to search Google for your chosen keywords, scrape...

How to create a simple product recommender system in Pandas

Product recommender systems, or recommendation systems, as they’re also known are ubiquitous on e-commerce websites these days. They’re relatively simple to create and even fairly basic ones can give striking...

How to create PDF reports in Python using Pandas and Gilfoyle

While reporting is often quite a useful way to stay on top of your data, it’s also something you can automate to save time, even if your reports include custom...

How to create monthly Google Analytics reports in Pandas

Like most people who work in ecommerce and marketing, I spend a lot of time in Google Analytics. It’s a great tool, but when reporting on the numbers, it helps...

How to segment customers using RFM and ABC

While the Recency, Frequency, Monetary value or RFM approach to customer segmentation might be old, it’s based on sound science, so no matter what customer model you’re building, it’s generally...

How to perform a customer cohort analysis in Pandas

Cohort analysis is unlike most other customer segmentation techniques in that it typically uses a time-based element. It’s typically used to segment customers into groups, or cohorts, based on their...

How to calculate CLV using BG/NBD and Gamma-Gamma

Calculating Customer Lifetime Value or CLV is considered a really important thing in marketing and ecommerce, yet most companies can’t do it properly. This clever metric tells you the predicted...

How to assign RFM scores with quantile-based discretization

RFM segmentation is one of the oldest and most effective ways to segment customers. RFM models are based on three simple values - recency, frequency, and monetary value - which...

How to engineer customer purchase latency features

Purchase latency or customer latency is a measure of the number of days between a customer’s orders and is one of the most powerful features in many propensity and churn...

How to create targeted B2B company sector datasets

As I explained in my previous post, many B2B ecommerce businesses spend huge amounts on procuring third-party data for companies they wish to target. However, with some data science skills...

How to unserialize serialized PHP arrays using Python

If you regularly work with ecommerce data, you’re likely to have encountered PHP serialized arrays or objects. Serialization is a process used to take a complex data structure, such as...

How to identify keyword cannibalisation using Python

Keyword cannibalisation occurs when you have several pages ranking for the same phrase, effectively putting them in competition with each other for search engine rankings. Since Google generally only shows...

How to analyse Pandas dataframes using SQL with PandaSQL

If, like me, you’ve come from a background where you made heavy use of SQL, then getting to grips with filtering, subsetting, and selecting data in Pandas can be a...

How to geocode and map addresses using GeoPy

In the field sales sector, one common thing you’ll want to do is identify all the potential clients you have within a particular region, so you can assign your team...

How to create paid search keywords using Pandas

Setting up keywords for new paid search accounts can be a repetitive and time-consuming process. While it’s historically been done using Excel, many digital marketers are now taking advantage of...

How to use the Pandas value_counts() function

The Pandas value_counts() function can be used to count the number of times a value occurs within a dataframe column or series, as well as calculating frequency distributions. Here’s a...

How to query MySQL and other databases using Pandas

For years, I used to spend much of my time performing Exploratory Data Analysis directly in SQL. Over time, the queries I wrote became very complicated, and it was often...

The four Python data science libraries you need to learn

There are hundreds of excellent Python data science libraries and packages that you’ll encounter when working on data science projects. However, there are four of them that you’ll probably use...

How to visualise data using line charts in Seaborn

Line charts, line graphs, or line plots are among the most widely used data visualisations. They’re ideal for time series data in which you’re plot a metric on the y...

How to visualise data using barplots in Seaborn

Barplots or bar charts are probably the most widely used visualisation for displaying and comparing categorical variables. They’re very easy to understand and are quick and easy to generate.

How to visualise correlations using Pandas and Seaborn

Pearson’s product-moment correlation, or Pearson’s R, is a statistical method commonly used in data science to measure the strength of the linear relationship between variables. If you can identify existing...

How to visualise categorical data in Seaborn

Categorical data can be visualised in many ways, and there’s no requirement to stick to the standard bar chart. Here are a selection of attractive Seaborn charts, graphs, and plots...

How to create a dataset for product matching models

Product matching (or data matching) is a computational technique employing Natural Language Processing, machine learning, or deep learning, which aims to identify identical products being sold on different websites, where...

How to visualise analytics data using heatmaps in Seaborn

Heatmaps are one of the most intuitive ways to display data across two dimensions, and they work particularly well on temporal data, such as web analytics metrics. They’re a great...

How to visualise RFM data using treemaps

Recent papers on the Recency, Frequency, Monetary (RFM) model, such as this one by Inanc Kabasakal which was published earlier this year, have started to adopt text-based labels to help...

How to visualise data using scatterplots in Seaborn

Scatterplots, scatter graphs, scatter charts, or scattergrams, are one of the most popular mathematical plots and represent one of the best ways to visualise the relationship of data on two...

How to visualise data using histograms in Pandas

During the Exploratory Data Analysis or EDA stage one of the key things you’ll want to do is understand the statistical distribution of your data. Histograms are one of the...

How to visualise data using boxplots in Seaborn

The boxplot, or box-and-whisker diagram, is one of the most useful ways to visualise statistical distributions in data. While they can seem a bit unintuitive when you first look at...

How to use SMOTE for imbalanced classification

Imbalanced classification problems, such as the detection of fraudulent card payments, represent a significant challenge for machine learning models. When the target class, such as fraudulent transactions, makes up such...

How to use Recursive Feature Elimination in your models using RFECV

Something which often confuses non data scientists is that too many features can be a bad thing for a model. It does sound logical that including more features and data...

How to use transform categorical variables using encoders

There are loads of different ways to convert categorical variables into numeric features so they can be used within machine learning models. While you can perform this process manually on...

How to select, filter, and subset data in Pandas dataframes

Selecting, filtering and subsetting data is probably the most common task you’ll undertake if you work with data. It allows you to extract subsets of data where row or column...

How to resample time series data in Pandas

When working with time series data, such as web analytics data or ecommerce sales, the time series format in your dataset might not be ideal for the analysis you’re performing...

How to reformat dates in Pandas

If you regularly work with time series data in Pandas it’s probable that you’ll sometimes need to convert dates or datetimes and extract additional features from them.

How to import data into Pandas dataframes

Pandas allows you to import data from a wide range of data sources directly into a dataframe. These can be static files, such as CSV, TSV, fixed width files, Microsoft...

How to group and aggregate transactional data using Pandas

Transactional item data can be used to create a number of other useful datasets to help you analyse ecommerce products and customers. From the core list of items purchased you...

How to create ecommerce sales forecasts using Prophet

Time series forecasting models are notoriously tricky to master, especially in ecommerce, where you have seasonality, the weather, marketing promotions, and holidays to consider. Not to mention pandemics.

How to create a response model to improve outbound sales

The predictive response models used to help identify customers in marketing can also be used to help outbound sales teams improve their call conversion rate by targeting the best people...

How to analyse search traffic using the Google Trends API

The things we search for online can reveal a remarkable amount about us, even when viewed in aggregate on an anonymous level. For many years, Google has made some of...

How to use identify visually similar images using hashing

Image hashing (or image fingerprinting) is a technique that is used to convert an image to an alphanumeric string. While this might sound somewhat pointless, it actually has a number...

How to create an ABC XYZ inventory classification model

As everyone who works in ecommerce will know, stock-outs on your key lines can have a massive negative impact on sales and your marketing costs. In many cases, you’ll be...

How to speed up the NLP text annotation process

When you’re building a Natural Language Processing model, it’s the text annotation process which is the most laborious and the most expensive for your business. While you can use tools...

How to import data into Google Data Studio using Python

Google Data Studio has native support for a range of platforms, but there’s no reliable means of pushing data in from Python without going via another data source. Google BigQuery...

How to import data into BigQuery using Pandas and MySQL

Google BigQuery is a “serverless” data warehouse platform stored in the Google Cloud Platform. The serverless approach means you don’t have to maintain a server yourself and Google looks after...

How to create synthetic data sets for machine learning

While there are many open source datasets available for you to use when learning new data science techniques, sometimes you may struggle to find a data set to use to...

How to create an ABC inventory classification model

ABC inventory classification has been one of the most widely used methods of stock control in operations management for decades. It’s an intentionally simple system in which products are assigned...

How to connect to MySQL via an SSH tunnel in Python

Many MySQL databases are configured to accept connections from other servers on the local network and will reject connections from remote machines. Ordinarily, you could work around this by creating...

How to bin or bucket customer data using Pandas

Data binning, bucketing, or discrete binning, is a very useful technique for both preprocessing and understanding or visualising complex data, especially during the customer segmentation process. It’s applied to continuous...

How to use Category Encoders to encode categorical variables

Most datasets you’ll encounter will probably contain categorical variables. They are often highly informative, but the downside is that they’re based on object or datetime data types such as text...

How to create an ecommerce trading calendar using Pandas

In both B2C and B2B ecommerce, special trading periods such as Christmas, Mothers’ Day, and Valentines’ Day can often greatly contribute to sales. Indeed, the introduction of Black Friday sales...

How to use the Pandas melt function to reshape wide format data

When you gain access to a new dataset, chances are, it’s probably not in the format you require for analysis or modeling. The most common problem you’ll encounter is datasets...

How to scrape competitor technology data in Python

In ecommerce, it pays to watch what your competitors are doing, so over the past decade or so in which I’ve managed ecommerce businesses, I’ve regularly undertaken competitor analyses. They’re...

How to create a Pandas dataframe

The massive versatility of Pandas means that you can create dataframes from almost any type of raw data. Whether you have a list, a list of lists, a dictionary, a...

How to create a collaborative filtering recommender system

Recommender systems, or recommendation engines as they’re also known, are everywhere these days. Whether you’re looking for books on Amazon, tracks on Spotify, movies on Netflix or a date on...

How to use GAPandas to view your Google Analytics data

Over the past decade I’ve written more Google Analytics API queries than I can remember. Initially, I favoured PHP for these (and still do for permanent web-based applications utilising GA...

How to use scikit-learn datasets in data science projects

The scikit-learn package comes with a range of small built-in toy datasets that are ideal for using in test projects and applications. As they’re part of the scikit-learn package, you...

How to use mean encoding in your machine learning models

When you’re building a machine learning model, the feature engineering step is often the most important. From your initial small batch of features, the clever use of maths and stats...

How to impute missing numeric values in your dataset

As models require numeric data and don’t like NaN, null, or inf values, if you find these within your dataset you’ll need to deal with them before passing the data...

How to engineer date features using Pandas

When dealing with temporal or time series data, the dates themselves often yield information that can vastly improve the performance of your model. However, to get the best from these...