Search

Search the Practical Data Science blog for hundreds of free tutorials on data science, machine learning, and data engineering.

Search

Categories

Tags

Results

How to tune a LightGBMClassifier model with Optuna

Learn how to create and tune a classification model using the LightGBM LightGBMClassifier and tun...

How to use the Pandas truncate() function

Pandas comes with two functions called truncate() that allow you to crop the top, bottom, or midd...

How to use Spacy for noun phrase extraction

Learn how to use Spacy to extract nouns and noun phrases from text using the Natural Language Pro...

How to use the Pandas filter() function

Learn how to use the Pandas filter() function to filter or subset a dataframe based on the column...

How to use Pandas shift() to create lagged variables

Learn how to use the Pandas filter() function to filter or subset a dataframe based on the column...

How to use Pandas to_json() to export JSON data

Learn how to use the Pandas to_json() function to export a dataframe to JSON, JSON lines, or a ho...

How to use the Pandas query() function

Learn how to use the Pandas query() function to search, filter, or subset a dataframe to show onl...

How to use Pandas from_records() to create a dataframe

Learn how to use the Pandas from_records() function to create a dataframe from a list of dictiona...

How to calculate an exponential moving average in Pandas

Learn how to use the Pandas ewm() function with mean() to calculate an exponential moving average...

How to use the Pandas map() function

Learn how to use the Pandas map() function to create a new dataframe column or series based on a ...

How to use Pandas pipe() to create data pipelines

The Pandas pipe() function can be used to create data pipelines and works well with the modern Pa...

How to use Pandas assign() to create new dataframe columns

Learn how to use the Pandas assign() method to create new dataframe columns and avoid the dreaded...

How to measure Python code execution times with timeit

Learn how to time Python code execution in Jupyter notebooks using the timeit magic command built...

How to use Pandas show_versions() to view package versions

Learn how to use pd.__version__ and pd.show_versions() to view the current version of Pandas inst...

How to use Pandas from_dict() to create a dataframe

Learn how to use the Pandas from_dict() function to create a Pandas dataframe from a Python dicti...

How to use method chaining in Pandas

Pandas method chaining is a modern way to improve Pandas code readability and performance by spli...

How to round values in Pandas using round()

Learn how to use the Pandas round() function and Numpy ceil() and floor() to round numbers to a g...

How to transpose a Pandas dataframe using T and transpose()

Learn how to transpose data and flip the orientation of a Pandas dataframe using the T and transp...

How to use Pandas to_numeric() to convert strings to numbers

Learn how to use the Pandas to_numeric() function to convert non-numeric strings or objects to nu...

How to use the Pandas set_index() and reset_index() functions

Learn how to use the Pandas set_index() and reset_index() functions to add and remove a single or...

How to use lambda functions in Pandas

Learn how to use lambda functions in Pandas to run small anonymous custom functions using apply()...

How to measure and reduce Pandas memory usage

Learn how to measure and reduce memory usage in Pandas using memory_usage(), info(), and categori...

How to calculate percentage change between columns in Pandas

Learn how to calculate the percentage change or percentage difference between two columns in a Pa...

How to get a list of national holiday dates in Python

Learn how to get a list of dates for the national holidays or bank holidays of any country using ...

How to use Spacy EntityRuler for custom Named Entity Recognition

Learn how to use the Spacy EntityRuler for custom Named Entity Recognition, or custom NER, by ext...

How to calculate Spearman's rank correlation coefficient in Pandas

Learn to calculate the Spearman's rank correlation coefficient, or Spearman's rho, using the Pand...

How to do custom Named Entity Recognition in Pandas using Spacy

Learn how to use perform custom named entity recognition in Pandas with Spacy by analysing the sk...

How to calculate a rolling average or rolling mean in Pandas

Learn how to use the Pandas rolling() method to calculate the rolling mean, rolling average, or ...

How to reorder Pandas dataframe columns

Learn how to rearrange or reorder columns in Pandas into your desired order using square brackets...

How to split strings using the Pandas split() function

Learn how to use the Pandas split() function to split strings into lists or columns, including th...

How to use Pandas explode() to split a list column into rows

Learn how to use the Pandas explode() function to split a list column into multiple rows...

How to use Pandas std() to calculate standard deviation

Learn how to use the Pandas std() function to calculate the standard deviation of dataframe colum...

How to use Pandas sample() to show a sample of data

Learn how to use the Pandas sample() function to show a sample of data, including the lesser know...

How to use Pandas concat() to concatenate dataframes

Learn how to use the Pandas concat() function to concatenate dataframes and add new rows and colu...

How to get and set Pandas cell values with at[] and iat[]

Learn how to get and set the values of Pandas dataframe cells using at[] and iat[] methods to ext...

How to use pop() to drop a Pandas dataframe column

Learn how to use the Pandas pop() method to drop, remove, and extract a column from a dataframe o...

How to use Pandas head() and tail() to get the first and last rows

Learn how to select the first and last rows of a Pandas DataFrame using head() and tail(), flip t...

How to use append() to add rows to a Pandas dataframe

Learn how to use the append() function to add or append rows to a Pandas dataframe or join two da...

How to prefix or suffix Pandas column names and values

Learn how to add a prefix or suffix to Pandas column names and values using add_prefix(), add_suf...

How to find the most common value in a Pandas dataframe column

Learn how to find the most common or frequent value in a Pandas dataframe column using value_coun...

How to drop Pandas dataframe rows and columns

Learn how to use the Pandas drop() function to drop dataframe rows and columns based on column na...

How to calculate Pearson correlation coefficient in Pandas

Learn to use the Pandas corr() statistical method to compute the Pearson correlation coefficient ...

How to classify Google Search Console data in EcommerceTools

Learn how to use ABCD classification to classify Google Search Console page data in EcommerceTool...

How to use Pandas date_range() to create date ranges

Learn to use the Pandas date_range() function to create a DatetimeIndex, list, or dataframe of da...

How to slugify column names and values in Pandas

Learn how to slugify column names and values in Pandas by removing non-alphanumeric characters an...

How to identify and remove duplicate values in Pandas

Learn how to identify duplicate Pandas column values and rows using duplicated() and remove or de...

How to identify and count unique values in Pandas

Learn how to use the Pandas unique() and nunique() methods to identify and count unique values in...

How to create a customer retention model with XGBoost

Learn how to create a customer retention model (or churn model) in Python using XGBoost and Optun...

How to use sort_values() to sort a Pandas DataFrame

Learn how to use the sort_values() and sort_index() methods to sort a Pandas DataFrame by one or ...

How to use Pandas CategoricalDtype to create custom sort orders

Learn how to use the Pandas CategoricalDtype to create custom sort orders for weekdays, month nam...

How to convert a Pandas dataframe or series to a list

Learn how to convert a Pandas dataframe, series, or column, to a list or dictionary using the tol...

How to add a new column to a Pandas dataframe

Learn how to add new columns to a Pandas dataframe using insert, loc, assign, and by manually dec...

How to add feature engineering to a scikit-learn pipeline

Learn to use FunctionTransformer to create a scikit-learn pipeline that includes feature engineer...

How to tune a CatBoostClassifier model with Optuna

Learn how to create a classification model using CatBoostClassifier and tune its hyperparameters ...

How to tune an XGBRegressor model with Optuna

Learn how to tune the hyperparameters of an XGBRegressor regression model with Optuna and improve...

How to create and tune an AdaBoost classification model

Learn how to create an AdaBoost classification model using the scikit-learn AdaBoostClassifier an...

How to zip files and directories with Python

Learn how to zip files and directories with Python using the zipfile module for file compression.

How to list files and directories with Python

Learn how to list files and directories with Python using the os module, and the listdir(), walk(...

How to use a .gitignore file

Learn how to use a .gitignore file to keep your repository clean and prevent sensitive informatio...

How to use Spacy for POS tagging in Pandas

Learn how to use Spacy for POS tagging in Pandas and identified Parts of Speech, stopwords, and p...

How to convert a column list of dictionaries to a Pandas dataframe

Learn how to convert a Pandas column containing a list of Python dictionaries or JSON objects int...

How to transcribe YouTube videos with OpenAI Whisper

Learn how to download, transcode, and transcribe YouTube videos with the OpenAI Whisper Automatic...

How to create a Shopify price tracker with Python

Learn how to create a Shopify price tracker by scraping a Shopify ecommerce store with Python and...

How to create a QR code using Python

In this tutorial, we will learn how to create a Quick Response or QR code using Python, the QRCod...

How to use Optuna for XGBoost hyperparameter tuning

Learn how to use Optuna for XGBoost hyperparameter tuning by tuning model parameters on an XGBCla...

How to use NLTK for POS tagging in Pandas

Learn how to use NLTK for Part of Speech tagging in Pandas to analyse the text in a dataframe col...

How to create GitLab issues using the Python GitLab API

Learn how to create GitLab issues using Python and the GitLab API, so you can automatically assig...

How to transform numeric Pandas dataframe column values

Learn how to transform Pandas column values into other formats including float, int, datetime, an...

How to calculate the difference and percentage change between rows in Pandas

Learn how to use diff() and pct_change() in Pandas to calculate the difference and percentage cha...

How to calculate Customer Lifetime Value heuristics

Calculating Customer Lifetime Value or CLV is a lot harder to do than most people realise, especi...

How to create a fake review detection model

Learn how to build a fake review detection model using TfidfVectorizer and a range of machine lea...

How to use isna() to check for missing values in a Pandas dataframe

The Pandas isna() function is used to check for missing values in dataframe columns. Here's how t...

How to resize images in Python using Pillow

Learn how to resize images and create thumbnails in Python using Pillow - the Python Imaging Libr...

How to transcode a YouTube video to MP3 in Python

Learn how to download a YouTube video and transcode it to MP3 using Python and YouTube_DL from wi...

How to change Pandas dataframe settings and options

Learn how to change Pandas dataframe settings and options and change the maximum number of rows, ...

How to identify and change Pandas dtypes using info() and astype()

Learn how to check Pandas dtypes using info() and cast them to the correct data type using astype...

How to find the differences between two Pandas dataframes

Learn how to compare Pandas dataframes and individual columns to see if there are any differences...

24 tutorials to get you started using Pandas for data science

Pandas is one of the best tools in data science and a great reason to learn Python. Here are 24 P...

How to scrape a Shopify site in Python via products.json

Learn how to scrape a Shopify ecommerce site using Python, requests, and the products.json file, ...

How to split a Pandas column string or list into separate columns

Learn how to split or explode a string or Python list stored in a Pandas column into separate col...

How to export data from Pandas dataframes

Learn how to export data from Pandas dataframes to CSV, TSV, JSON, HTML, Feather, Parquet, Stata,...

How to bin data in Pandas with cut() and qcut()

The Pandas cut() and qcut() functions are used for data binning, data bucketing, and discretizati...

How to calculate the profitability of BOGOF and multibuy promotions

Multibuy promotions and Buy One, Get One Free or BOGOF deals are widely used in ecommerce. Here's...

How to analyse ecommerce coupon uplift with GAPandas

Learn how to use GAPandas to measure ecommerce coupon uplift and optimise the AOV of your website...

How to use CSS and XPath custom extraction in Advertools

Learn how to perform custom extraction when web scraping with Advertools using CSS selectors and ...

How to scrape a website using Advertools

Learn how to perform web scraping with Advertools in list mode to scrape an XML sitemap and scrap...

How to query the Google Analytics Data API for GA4 using Python

Learn how to query the Google Analytics Data API for GA4 using Python with GAPandas4 to fetch you...

How to get a list of the dimensions and metrics in your GA4 property

Query the Google Analytics Data API using GAPandas4 and get back a full list of dimensions and me...

How to create an ABC customer segmentation in Pandas

ABC customer segmentation using the Pareto principle or 80/20 rule to segment customers based on ...

How to rename columns in Pandas dataframes

Learn how to rename Pandas dataframe columns, add a prefix or suffix, convert them to lowercase o...

How to visualise internal linking in Python using NetworkX graphs

Learn how to scrape a website's internal linking in Python and visualise the connections between ...

How to perform tokenization in NLP with NLTK and Python

Learn how to tokenize text data in Pandas using the NLTK Punkt text tokenizer for Natural Languag...

How to calculate the ecommerce KPIs you need to hit your revenue target

Learn how to calculate the conversion rate, average order value, and amount of traffic your ecomm...

How to create a Naive Bayes text classification model using scikit-learn

Learn how to create a Multinomial Naive Bayes text classification model in Python using the sciki...

How to use cross validation in scikit-learn machine learning models

Learn how to use k-fold cross validation in scikit-learn to create more robust machine learning m...

How to create a random forest classification model using scikit-learn

Learn how to create a random forest classification model using scikit-learn in Python with the sk...

How to create a decision tree classification model using scikit-learn

Learn how to create a decision tree classification model using scikit-learn in Python with the sk...

How to dedupe lists in Python with set() and intersection()

Learn how to dedupe lists in Python using set(), intersection(), and dict.fromkeys() and identify...

How to use DATEDIFF() to calculate date differences in MySQL

Learn how to calculate the number of days between two dates in MySQL using the DATEDIFF() functio...

How to use DATE_ADD() and DATE_SUB() to add and subtract from dates

Learn how to add and subtract from dates in MySQL using the DATE_ADD() and DATE_SUB() functions b...

How to use CASE for flow control in SQL statements

The CASE statement is used to evaluate a condition and return a value based on the result of the ...

How to use DATE_FORMAT() to reformat dates in MySQL

The MySQL DATE_FORMAT() function lets you convert datetime values to other formats. Here's a quic...

How to use string functions in SQL statements

SQL string functions and operators allow you to manipulate strings in SQL statements and are very...

How to use ORDER BY to sort an SQL result set

The SQL ORDER BY clause is used to sort the results of a SELECT statement so you can place column...

How to use GROUP BY and HAVING in SQL statements

The GROUP BY and HAVING clauses are used to group the results of a SELECT statement and are extre...

How to use BETWEEN in SQL statements to return data between two values

Selecting data between two values or dates is easy when you use create a BETWEEN AND expression i...

How to use SELECT, FROM, WHERE, and AND in SQL statements

Learn how to query a MySQL database using SELECT, FROM, WHERE, and AND in simple SQL statements.

How to import a MySQL database

Learn how to import a MySQL database onto a MySQL Docker container and query the database using P...

How to read QR codes in Python using OpenCV

Learn how to read QR codes from images in Python using the OpenCV computer vision library's QRCod...

How to calculate month start and month end dates in Python

When creating business reports you'll often need to be able to calculate the month start and mont...

How to calculate ISO week numbers and start and end dates in Python

Most ecommerce and marketing teams use ISO week reporting periods. Here's how to calculate ISO we...

How to forecast Google Trends search data with NeuralProphet

Google Trends data can give retailers insight into the growth or decline of searches in the wider...

How to use dictionaries in Python

Python dictionaries allow you to store key-value pairs, so you can store, lookup, and retrieve da...

How to analyse Average Order Value with Jenks natural breaks classification

Learn how to analyse Average Order Value using the Google Analytics API and Jenks natural breaks ...

How to check if URLs are redirected using Requests

Learn how to use the Python Requests library to check the status code of a URL and identify wheth...

How to use operators in Google Analytics API queries

Google Analytics API query operators let you select or filter specific data from your GA account....

How to calculate abandonment and completion rates using the Google Analytics API

The Google Analytics API does not expose Shopping Behaviour report data such as cart and checkout...

How to use CountVectorizer for n-gram analysis

Learn how to use scikit-learn CountVectorizer for n-gram analysis and analyse unigrams, bigrams, ...

How to use try except for Python exception handling

Learn how to use try, except, else, and finally (or try catch) for Python exception handling to i...

How to use the Pip Python package manager

Learn how to use the Pip Python package manager to install, upgrade, and remove Python packages.

How to calculate the time difference between two dates in Pandas

Learn how to calculate the time difference between two dates in Pandas and return the difference ...

How to use the Dropbox API with Python

The Dropbox API makes it easy to upload, download, share, and delete files using Python and is id...

How to send emails using the Mailchimp Transactional email API

Learn how to send transactional emails from your Python application using the Mailchimp Transacti...

How to Dockerize a data science application

Learn how to Dockerize your data science application in five minutes so your code runs perfectly ...

How to read an XML feed into a Pandas dataframe

Learn how to create an XML feed parser that will read your Google Shopping feed (or any other XML...

How to backup a MySQL database using mysqldump, SSH and SCP

Learn how to backup a MySQL database using mysqldump and SSH and download the dump file to your l...

How to use the Pandas apply function on dataframe rows and columns

The Pandas apply function lets you apply a function to columns or rows in your Pandas dataframe. ...

How to create descriptive statistics using the Pandas describe function

The Pandas describe function lets you quickly create descriptive statistics from a dataframe and ...

How to create Google Search Console time series forecasts using Neural Prophet

Learn how to use the Neural Prophet model and EcommerceTools to create time series forecasts of y...

16 Python web scraping projects for ecommerce and SEO

Here are 16 Python web scraping projects you can use on your ecommerce website to improve marketi...

A quick guide to customer retention

Customer retention rate is one of the most misunderstood marketing metrics. Here's a quick guide ...

How to create a Google rank checker tool using Python

Learn how to create a Google rank checker tool using the Python EcommerceTools package so you can...

How to use the Feefo API for ecommerce competitor analysis

Learn how to use the Feefo API for ecommerce competitor analysis and understand what products com...

How to compare time periods using the Google Search Console API

Learn to use EcommerceTools to query the Google Search Console API with Python and compare two ti...

How to create a Google Service Account client secrets JSON key

Learn how to create a client secrets JSON key file and set up a Google Service Account so you can...

A quick guide to the RFM model for data scientists

The RFM model measures Recency, Frequency, and Monetary value and is used to predict future custo...

How to run time-based SEO tests using Python

Learn how to run SEO tests in Python using EcommerceTools to fetch your Google Search Console dat...

How to create content recommendations using TF IDF

Learn how to use the Term-Frequency Inverse Document Frequency (TF IDF) and cosine similarity to ...

How to create a contractual churn model in scikit-learn

Whether you sell magazine subscriptions, mobile phone contracts, broadband, or car insurance, con...

How to avoid model overfitting with early stopping rounds

Overfitting reduces model performance. Here's how you can avoid it using the XGBoost early stoppi...

A quick guide to customer segmentation for B2B e-commerce

The approaches used for B2B customer segmentation are slightly different from those used in B2C e...

How to detect Google Search Console anomalies

Learn how to use Python to export data from the Google Search Console API using Python and constr...

How to classify customer support tickets using Naive Bayes

Improve the efficiency of your customer service team by creating a Naive Bayes model to classify ...

How to use pipelines in your machine learning models

Using pipelines keeps machine learning code cleaner, easier to maintain, easier to move to produc...

How to infer the effects of marketing using the Causal Impact model

The Causal Impact model lets you examine ecommerce and marketing time series data to understand w...

How to identify SEO keyword opportunities with Python

Learn how to use Python to scrape and parse an XML sitemap, crawl and scrape a site, connect the ...

How to add days and subtract days from dates in Pandas

Learn how to add days and subtract days from dates in Pandas using the Python timedelta function ...

How to analyse Google Analytics demographics and interests with GAPandas

Google Analytics demographics and interests data are a useful way to quickly understand the custo...

How to identify striking distance keywords with Python

Learn how to find striking distance keywords in Google Search Console API data with Python and im...

A quick guide to lead scoring for B2B e-commerce sites

Lead scoring is a commonly used CRM technique in most B2B e-commerce sites. Here's how the variou...

How to trigger marketing automations using the Mailchimp API in Python

By assigning or removing tags to subscribers using the Mailchimp marketing API you can create pow...

How to create monthly Google Search Console API reports with EcommerceTools

Learn how to use EcommerceTools to create monthly Google Search Console API reports that let you ...

How to engineer new features using Decision Tree models

Learn how to use Decision Trees to engineer or derive new features from your existing data and im...

How to use the Mailchimp Marketing Python API with Pandas

Learn how to use the Mailchimp API in Python by creating email marketing reports using Pandas and...

How to use the eBay Finding API with Python

The eBay SDK allows developers to search and retrieve eBay listings using a Python API. Here's ho...

How to export Zendesk tickets into Pandas using Zenpy

Zenpy is an unofficial Zendesk API for Python that allows you to export and update tickets. Here'...

How to query the Google Search Console API with EcommerceTools

EcommerceTools makes it quick and easy to query the Google Search Console API and display the dat...

How to read Google Sheets data in Pandas with GSpread

The GSpread package makes it quick and easy to read Google Sheets spreadsheets from Google Drive ...

How to calculate the Lin Rodnitzky Ratio using GAPandas

The Lin Rodnitzky Ratio is designed to assess paid search account management quality. Here's how ...

How to analyse product replenishment

By identifying products that are regularly replenished and contacting customers when they are run...

Data science courses for budding data scientists and data engineers

If you are considering learning data science there are now hundreds of online data science course...

A quick guide to customer segmentation for data scientists

Customer segmentation is the process of grouping customers based on shared characteristics in ord...

How to create an ecommerce purchase intention model in Python

Ecommerce purchase intention models predict the probability of each customer making a purchase, s...

How to create a classification model using XGBoost in Python

Learn how to create a classification model using XGBoost and scikit-learn in Python by classifyin...

How to predict employee churn using CatBoost

Use CatBoost to create an employee churn model that will predict which of your staff is going to ...

How to read an RSS feed in Python

Learn how to use create a Python RSS reader using Requests-HTML to read an RSS feed, parse the fe...

How to auto-generate meta descriptions with EcommerceTools

Learn how to use EcommerceTools to create automated meta descriptions via deep learning and the B...

19 Python SEO projects that will improve your site

Using Python for SEO is really catching on. There are loads of ways you can use Python SEO projec...

How to identify internal and external links using Python

Learn how to identify internal and external links through web scraping in Python and help identif...

How to create a basic Marketing Mix Model in scikit-learn

Marketing Mix Models (MMMs) let you test what marketing results you'll get from changing the amou...

How to scrape Google results in three lines of Python code

Here's a quick and easy way to scrape Google search engine results into a Pandas dataframe in jus...

How to make time series forecasts with Neural Prophet

The Neural Prophet time series forecasting model was developed by Facebook and is a powerful tool...

How to create a simple product recommender system in Pandas

Learn how to create a product recommender or product recommendation system in Python using Pandas...

15 ways you can use data science to boost ecommerce performance

There are dozens of use cases for ecommerce data science, covering everything from segmentation t...

How to create PDF reports in Python using Pandas and Gilfoyle

To save time, I created a Python package for generating PDF reports and presentations. Here's how...

How to create monthly Google Analytics reports in Pandas

Here's how you can use GAPandas to create monthly analytics reports on marketing and ecommerce da...

How to segment your customers using EcommerceTools

EcommerceTools makes it quick and easy to segment your customers using a range of powerful techni...

How to use EcommerceTools for technical SEO

The EcommerceTools package lets you check SERPs, examine robots.txt files, analyse Core Web Vital...

How to use the Isolation Forest model for outlier detection

Learn how to use the Isolation Forest or iForest algorithm in sklearn for automated outlier detec...

How to use k means clustering for customer segmentation

K means clustering is the most widely used machine learning algorithm and is well-suited to custo...

How to segment customers using RFM and ABC

Creating a value-based segmentation using RFM and ABC is a great way to tell the good customers f...

How to perform a customer cohort analysis in Pandas

Customer cohort analysis examines differences between customers over time and is a powerful tool ...

How to machine translate product descriptions

Machine translation systems, such as Google Translate, make it quick and easy to bulk translate p...

How to identify the causes of customer churn

Discover how to identify the causes of customer churn using Cox's Proportional Hazards model, so ...

How to identify near duplicate content using LMS

Learn how to detect near duplicate content using the Longest Matching Subsequence (LMS) technique...

How to create ecommerce anomaly detection models

Learn how to use the Anomaly Detection Toolkit (ADTK) to identify anomalies in ecommerce data ext...

How to create a product and price metadata scraper

Learn how to keep tabs on your competitors' pricing by building an ecommerce price scraper that u...

How to create a non-contractual churn model for ecommerce

Learn how to create a non-contractual churn model to let you predict churn and identify which cus...

How to classify customer service emails with Bart MNLI

Make your ecommerce customer service team more efficient by classifying their support emails auto...

How to calculate CLV using BG/NBD and Gamma-Gamma

Calculating Customer Lifetime Value is hard to do right. Learn how to calculate CLV using the BG/...

How to auto-generate product summaries using deep learning

Learn how to use Transformer models to automatically generate summaries from ecommerce product de...

How to assign RFM scores with quantile-based discretization

Use quantile-based discretization and K-means clustering to calculate RFM scores to your customer...

How to assess product copy using EQA models

Learn how to use Extractive Question Answering or EQA models to assess the quality of your ecomme...

How to analyse product consumption and repurchase rates

Learn how to shape your product, content, and pricing strategy by analysing product consumption a...

How to use Spintax to create content and ad copy in Python

Although Spintax was mainly used for the production of low quality articles, and emails from Nige...

How to use bagging, boosting, and stacking in ensembles

Learn how to use ensemble models that utilise bagging, boosting, and stacking to generate better ...

How to scrape schema.org metadata using Python

Learn to scrape more efficiently by extracting Schema.org metadata in JSON-LD, Microdata, and Ope...

How to scrape People Also Ask data using Python

Google’s People Also Ask or PAA boxes are increasingly common for popular search terms and are wo...

How to scrape Google search results using Python

Learn to scrape Google search results using Python and save loads of time and collect data that a...

How to perform time series decomposition

Time series decomposition lets you separate the trend and seasonality in your data so you can see...

How to join Google Analytics and Google Search Console data

Learn how to use Python to connect your Google Search Console API data to your Google Analytics R...

How to identify SEO keywords using Google Autocomplete

Learn how to use Python to identify the most popular SEO keywords linked to your search term by s...

How to find spelling and grammar issues on product pages

Spelling and grammar issues on product detail pages can make your site look unprofessional. Here’...

How to engineer customer purchase latency features

Learn how to engineer customer purchase latency features based on the time between each customer'...

How to create targeted B2B company sector datasets

Learn how to create targeted B2B company datasets for free using Python, Pandas, and Companies Ho...

How to create a UK data science jobs dataset

Want to analyse the data science and data engineering job market? Here's a quick guide to buildin...

How to create a product matching model using XGBoost

Product matching algorithms find identical products on ecommerce sites so users can compare produ...

How to create a Naive Bayes product classification model

Learn how to use NLP techniques to create a Multinomial Naive Bayes sklearn product classificatio...

How to create a dataset containing all UK companies

B2B ecommerce retailers spend large amounts on acquiring the addresses of potential customers to ...

How to count indexed pages using Python

Learn how to use Python to count the number of indexed pages a website has to help you monitor it...

How to calculate safety stock and reorder point

The safety stock calculation and reorder point calculation can greatly reduce the likelihood of c...

How to calculate operations management metrics in Python

Understand the most important metrics for operations managers and learn how to calculate them in ...

How to calculate marketing metrics in Python

Learn how to calculate marketing metrics such as CPM, CPC, conversion rate, ROMI, ROI, ROAS, CPO,...

How to calculate customer experience metrics in Python

Customer experience metrics and customer satisfaction metrics drive customer retention, so it's v...

How to calculate category management metrics in Python

Category management metrics can let you understand product sales and be more strategic in your pr...

How to access the Google Knowledge Graph Search API

The Google Knowledge Graph powers the Knowledge Panels and infobox elements of Google’s search re...

A quick guide to catalogue marketing data science

Catalogues may be living on borrowed time, but catalogue marketing data science techniques have b...

How to use knee point detection in k means clustering

Use the Kneedle algorithm to detect the knee or elbow point when k means clustering so you define...

How to use Extruct to identify Schema.org metadata usage

Extruct allows you to reveal a site's Schema.org metadata implementation, so you can build a more...

How to unzip files with Python

If you're downloading large zipped datasets via automated Python scripts, you may need to unzip o...

How to unserialize serialized PHP arrays using Python

PHP serialized arrays and objects are common in ecommerce database schemas. This is how you unser...

How to send data to Google Analytics in Python with PyGAMP

PyGAMP allows you to insert data into Google Analytics using the Measurement Protocol API in Pyth...

How to scrape Open Graph protocol data using Python

Learn how to use web scraping technologies, including urllib and Beautiful Soup, to scrape a webs...

How to scrape and parse a robots.txt file using Python

The robots.txt file includes potential useful information for crawlers and spiders, and is easy t...

How to scrape a site's page titles and meta descriptions

Learn how to apply web scraping tools to scrape a site's content and parse the page titles and me...

How to scan a site for 404 errors and 301 redirect chains

404 errors and 301 redirect chains can be damaging to the performance of a website and impact the...

How to resize and compress images in Python with the TinyPNG API

Learn how to use the TinyPNG API in Python to bulk resize and compress images to improve site per...

How to preprocess text for NLP in four easy steps

Learn how to apply tokenization, stopword removal, Porter stemming, and re-joining to preprocess ...

How to parse XML sitemaps using Python

XML sitemaps are a great way to gain insight on your competitors’ websites and identify pages to ...

How to parse URL structures using Python

When analysing web data, it’s common to need to parse URLs and extract the domain, directories, q...

How to identify keyword cannibalisation using Python

Learn how to use Python to identify keyword cannibalisation which occurs when multiple pages comp...

How to download files with Python

The Python urllib package allows you to download files from remote servers to use in your project...

How to detect sarcasm using machine learning

Can you tell when someone is taking the piss, when they haven't used a winking smiley? In this pr...

How to detect fake news with machine learning

Learn the Natural Language Processing techniques you need to use to identify fake news from real ...

How to calculate Economic Order Quantity in Python

Learn how to calculate the Economic Order Quantity or EOQ for a product to minimise holding costs...

How to build a web scraper using Requests-HTML

Requests-HTML wraps up the best bits from Requests and Beautiful Soup packages to create a web sc...

How to audit a site's Core Web Vitals using Python

Core Web Vitals are performance metrics that measure the quality of the user experience and are n...

How to analyse Pandas dataframes using SQL with PandaSQL

Learn how to use PandaSQL and query the data in your Pandas dataframes using SQL queries instead ...

How to analyse non-ranking pages and search index bloat

Learn how you can use Python to identify how many non-ranking pages your site has and check wheth...

How to access the Google Search Console API using Python

By accessing Google Search Console API data using Python you'll have access to whatever data you ...

A quick guide to search intent classification for SEO

Search intent classification aims to categorise search queries by user intent. But how do you do ...

How to use Screaming Frog from the command line

The Screaming Frog SEO Spider is widely used in digital marketing and ecommerce and has a powerfu...

How to send a Slack message in Python using webhooks

It’s really easy to send Slack messages using Python. In this project, we’ll create a really basi...

How to geocode and map addresses using GeoPy

Learn how to use GeoPy, Nominatim, and Folium to geocode and plot Pizza Express branches in the v...

How to create paid search keywords using Pandas

Pandas is a powerful tool for marketers, especially those involved in paid search advertising. He...

How to create a Python web scraper using Beautiful Soup

Beautiful Soup is one of the most powerful libraries for performing web scraping in Python. Here'...

The difference between data scientists and data engineers

Despite the growing demand, many people still don’t understand the difference between a data scie...

How to write better code using DRY and Do One Thing

Learn how to use the Don’t Repeat Yourself and Do One Thing techniques to help you create Python ...

How to visualise data with quirky hand-drawn plots

Want to dumb-down your plots and charts for your target audience? CuteCharts allows you to create...

How to visualise conversion funnels with Plotly

Funnels are one of the most useful and intuitive data visualisations used in ecommerce and market...

How to use style guidelines to improve your Python code

Learn how and why following Python style guidelines can make your code easier to understand, revi...

How to use SQLite in Python

The SQLite relational database management system is fast, lightweight, and easy to use. Here's ho...

How to use operators in Python

Python operators are one of the most important components of the language to grasp. Here’s a basi...

How to use lists in Python

Lists are one of the most widely used data storage objects or data types within Python and are us...

How to use Git for your data science projects

Learn how to use Git for your data science projects so you can keep your code backed-up and share...

How to use docstrings to improve your Python code

Using docstrings in Python makes it easier to see what functions do, what arguments they accept, ...

How to use the Pandas value_counts() function

The Pandas value_counts() function is great for calculating the number of occurrences of a value ...

How to query MySQL and other databases using Pandas

Querying MySQL and other databases using Pandas in Jupyter notebooks will change the way you work...

How to open, read, and write to files in Python

Python makes it very straightforward to open, read, and write data to files. Here's a quick guide...

The four Python data science libraries you need to learn

If you're learning data science, there are four Python data science libraries you absolutely need...

How to visualise text data using word clouds in Python

Word clouds, tag clouds, or wordles are an intuitive way to present text data to non-technical pe...

How to visualise statistical distributions with Seaborn

Understanding the statistical distribution of data is a crucial step in machine learning. Here’s ...

How to visualise data using Venn diagrams in Matplotlib

The Venn diagram is one of the most intuitive data visualisations for showing the overlap between...

How to visualise data using line charts in Seaborn

Line charts or line plots are among the most commonly used graphs in data science. Here’s how you...

How to visualise data using barplots in Seaborn

Learn how to create barplots or bar charts for comparing and visualising categorical data in Pyth...

How to visualise correlations using Pandas and Seaborn

Machine learning models make predictions from correlations between features and the target, so fi...

How to visualise categorical data in Seaborn

There’s more to visualising categorical data than bar charts. Here’s a selection of the other cha...

How to install the NVIDIA Data Science Stack on Ubuntu 20.04

The NVIDIA Data Science Stack is the quickest way to setup the drivers and packages needed for GP...

How to create desktop data science apps using Nativefier

Nativefier makes it easy to create Ubuntu desktop applications from websites using Electron. Here...

How to create an Ubuntu desktop entry to run Jupyter

Here's how you can create a Gnome desktop entry shortcut launcher icon to start up Docker and ope...

How to create a dataset for product matching models

Datasets for the product matching models required to verify price comparisons are hard to find. H...

How to build a data science workstation

Building your own data science workstation or deep learning workstation isn’t that difficult and ...

How to visualise analytics data using heatmaps in Seaborn

Heatmaps make visualising temporal data much easier. Here’s how you can create custom web analyti...

How to visualise RFM data using treemaps

Learn how to assign simple labels to your RFM data and visualise them using treemaps to help make...

How to visualise data using scatterplots in Seaborn

Scatterplots are a great way to visualise the distribution of data and the relationship between t...

How to visualise data using histograms in Pandas

Pandas histograms are one of the best ways to visualise the statistical distributions of data dur...

How to visualise data using boxplots in Seaborn

The Seaborn boxplot, or box-and-whisker diagram, is a great way to visualise the statistical dist...

How to use SMOTE for imbalanced classification

SMOTE, the Synthetic Minority Oversampling Technique, is one of the best ways to handle imbalance...

How to use Recursive Feature Elimination in your models using RFECV

Matt Clarke explains how you can use Recursive Feature Elimination with Cross Validation or RFECV...

How to use model selection and hyperparameter tuning

Model selection and hyperparameter tuning can greatly improve model performance. Learn how to use...

How to use transform categorical variables using encoders

Learn how to use Category Encoders to transform and convert categorical variables to numeric data...

How to select, filter, and subset data in Pandas dataframes

Learn a range of useful techniques to select, filter, and subset data stored in Pandas dataframes...

How to save and load machine learning models using Pickle

Machine learning or ML models can take days to train. Pickle save and Pickle load allows you to s...

How to resample time series data in Pandas

The Pandas resample function lets you group time series data by day, week, month, or year so it c...

How to reformat dates in Pandas

Learn how to use Python and Pandas to reformat dates and datetimes so you can display them in you...

How to import data into Pandas dataframes

Learn how to import data into Pandas from a wide range of different data sources, from CSV and Ex...

How to group and aggregate transactional data using Pandas

Learn how to group and aggregate transactional data using Pandas to create new datasets allowing ...

How to create ecommerce sales forecasts using Prophet

Creating accurate ecommerce time series forecasts using models such as ARIMA can be tricky. The P...

How to create a response model to improve outbound sales

Learn how to improve outbound sales using a machine learning response model that maximises your s...

How to create a linear regression model using Scikit-Learn

Want to get started with sklearn linear regression? Learn to use Python, Pandas, and scikit-learn...

How to analyse search traffic using the Google Trends API

Google Trends data is now being used in a range of models. Here’s how you can access the data usi...

How to use identify visually similar images using hashing

Learn how to use image hashing or image fingerprinting to find visually similar images or duplica...

How to create an ABC XYZ inventory classification model

The ABC XYZ inventory classification model is built on top of ABC inventory analysis and helps yo...

How to use Google Secret Manager to improve data security

Learn how to use Google Secret Manager to create secure environmental variables to hold your sens...

How to speed up the NLP text annotation process

Text annotation techniques like sequence labeling are vital in NLP, but are tedious, time-consum...

How to import data into Google Data Studio using Python

Google Data Studio doesn’t include native support for Python, but you can still import data from ...

How to import data into BigQuery using Pandas and MySQL

Learn how to import data into the Google BigQuery serverless data warehouse platform using Python...

How to create synthetic data sets for machine learning

Learn some simple techniques you can apply using Pandas and Numpy to create dummy, synthetic, or ...

How to create image datasets for machine learning models

Learn how to create image datasets for machine learning image classification models using Python ...

How to create an ABC inventory classification model

Learn how to create an ABC inventory classification model in Python so your procurement manager t...

How to connect to MySQL via an SSH tunnel in Python

MySQL databases are usually configured to only allow secure connections via SSH. Here’s how to cr...

How to calculate relative dates for Google Analytics queries

To automate Google Analytics API reports for Google Data Studio you’ll need to know how to calcul...

How to bin or bucket customer data using Pandas

Data binning or bucketing is a very useful technique for both preprocessing and understanding or ...

How to annotate training data for NLP models using Doccano

Doccano is a text annotation platform for NLP that makes it much quicker and easier to label and ...

Ecommerce and marketing data sets for machine learning

Here’s a selection of some of the most useful datasets I’ve found for building machine learning m...

How to use the BG/NBD model to predict customer purchases

The Beta-Geometric Negative Binomial Distribution or BG/NBD model lets you predict which customer...

How to use NLP to identify what drives customer satisfaction

Learn how to use web scraping and NLP to shape your ecommerce strategy by identifying what influe...

How to create a BI platform using Apache Superset

Learn how to create a powerful and extensible business intelligence (BI) platform for your ecomme...

How to use Apache Druid for real-time analytics data storage

Apache Druid is a real-time high performance analytics data store for big data that makes running...

How to set up a Docker container for your MySQL server

Learn how to create a Docker container for your MySQL or MariaDB database server so you can extra...

How to use Category Encoders to encode categorical variables

Category Encoders make it much easier to encode categorical variables during the machine learning...

How to create ecommerce data pipelines in Apache Airflow

Learn how to create an Apache Airflow data pipeline and see why it is one of the most widely used...

How to create an ecommerce trading calendar using Pandas

Learn how to use Pandas to create a dynamic ecommerce trading calendar of special trading events,...

Dell Precision 7750 mobile data science workstation review

The Dell Precision 7750 mobile workstation is aimed at data scientists who want a laptop for GPU-...

A quick guide to Product Attribute Extraction models

Learn why ecommerce retailers and marketplaces are creating Product Attribute Extraction (PAE) mo...

A quick guide to Next-Product-To-Buy models

Next-Product-To-Buy or NPTB models can predict not only what a customer will buy, but also when t...

A quick guide to machine learning

Machine learning (ML) is a branch of artificial intelligence (AI) and allows models to make predi...

A quick guide to machine learning uplift models

Unlike response or propensity models, uplift models let you identify customers who will only buy ...

A quick guide to Learning to Rank models

Learning to Rank or LTR models improve the performance of on-site search results on ecommerce web...

How to use the Pandas melt function to reshape wide format data

Learn how to use the Pandas melt function to reshape wide format data so you can use it in your m...

How to use the Apriori algorithm for Market Basket Analysis

Learn how to use the mlxtend Apriori algorithm to run a Market Basket Analysis on Google Analytic...

How to use Natural Language Understanding models

Learn how to use Natural Language Understanding models (NLU) via PyTorch and Hugging Face Transfo...

How to use Docker for your data science projects

Learning to use Docker for data science projects will make configuring, deploying, and sharing mo...

How to tune model hyper-parameters with grid search

Every scikit-learn model has hyper-parameters you can tune to obtain improvements. Here’s how to ...

How to test your Keras, CUDA, CuDNN, and TensorFlow install

Setting up TensorFlow, Keras, CUDA, and CuDNN can be a painful experience on Ubuntu 20.04. Here i...

How to scrape JSON-LD competitor reviews using Extruct

Here's how you can use Python, Selenium, and Extruct to create a headless web browser and scrape ...

How to scrape competitor technology data in Python

Learn how to automate the collection of website technology data from your competitors using Built...

How to perform facial recognition in Python

Facial recognition is now very effective and has become part of everyday life. Here's how to use ...

How to separate audio source data using Spleeter

Learn how to use Deezer's TensorFlow powered Spleeter model to separate music into vocals and acc...

How to create a Pandas dataframe

Pandas lets you create dataframes from almost any type of data, including lists, dictionaries, tu...

How to create a collaborative filtering recommender system

Learn how to use item-based and user-based collaborative filtering to create a powerful recommend...

How to build the 'Hotdog , not Hotdog' image classifier

Learn how to classify images using Keras and TensorFlow by building the 'Hotdog, Not Hotdog' Conv...

How to create a neural network for sentiment analysis

Learn how to use a recurrent neural network and the Long Short-Term Memory model to analyse senti...

How to use GAPandas to view your Google Analytics data

Learn how to use GAPandas to query the Google Analytics API and view, analyse, and visualise your...

How to use your GPU to accelerate XGBoost models

Do your XGBoost machine learning models take an age to run? You could make them several times fas...

How to use scikit-learn datasets in data science projects

To learn data science techniques you’ll need the right kind of datasets. Thankfully, many are eas...

How to use Python regular expressions to extract information

Regular expressions, or regexes, are widely used in data science for matching specific patterns i...

How to use mean encoding in your machine learning models

Learn how to use the mean encoding technique to generate powerful new features from your data to ...

How to interpret the confusion matrix

The confusion matrix can tell you more about your model than the accuracy score. We build a model...

How to impute missing numeric values in your dataset

Cleverly filling in the gaps when numeric data is missing from your dataset can often boost the p...

How to engineer date features using Pandas

In time series datasets dates often hold the key to improving performance, but they need to be tr...

How to create a Python virtual environment for Jupyter

Learn how to create a Python virtual environment for your Jupyter notebook using venv and virtual...