Noun phrase extraction is a Natural Language Processing technique that can be used to identify and extract noun phrases from text. Noun phrases are phrases that function grammatically as nouns...
Spacy’s EntityRuler component is one of several rule-based matcher components that can be used to extend the core functionality of the package. It’s really useful for the creation of custom...
As I showed in my previous tutorial on named entity recognition in Spacy, the EntityRuler allows you to customise Spacy’s default NER model to allow you to create your own...
Spacy is one of the most popular Python packages for Natural Language Processing. Alongside the Natural Language Toolkit (NLTK), Spacy provides a huge range of functionality for a wide variety...
OpenAI Whisper is a new open source automatic speech recognition (ASR) model from Elon Musk’s OpenAI project that has also brought us the incredible GPT-3 language models. Like GPT-3, it’s...
The Natural Language Toolkit (NLTK) is a powerful Python package for performing a wide range of common NLP tasks, including Part of Speech tagging or POS tagging for short.
Fake reviews seem to be everywhere these days, leaving customers unsure over which products or businesses are actually any good. Whether you’re shopping on Amazon, checking out a restaurant on...
Tokenization is a data science technique that breaks up the words in a sentence into a comma separated list of distinct words or values. It’s a crucial first step in...
Naive Bayes classifiers are commonly used for machine learning text classification problems, such as predicting the sentiment of a tweet, identifying the language of a piece of text, or categorising...
CountVectorizer is a scikit-learn package that uses count vectorization to convert a collection of text documents to a matrix of token counts. Given a corpus of text documents, such as...
After work, when I’m not learning about data science, practising data science, or writing about data science, I like to browse classic car auction sites looking for cars I can’t...
In ecommerce, customer service staff are often among the busiest people in the organisation, handling hundreds of tasks every day, often simultaneously. However, CS managers often get so bogged down...
Meta descriptions are strings of text added to the head of an HTML document to describe its content to search engines and search engine users and are of critical importance...
Whether you’re analysing content written in other languages using Natural Language Processing, or you want to assist your content team by translating their writing into other languages, machine translating software...
Zero-shot learning, or ZSL, is a machine learning process commonly used for Natural Language Processing that allows you to generate predictions on unseen data without the need to train a...
Several years ago, in one of my first Ecommerce Director roles, I worked with the ex-Myprotein founder to launch sports nutrition brand GoNutrition. As a “bootstrapped” startup, we were low...
In ecommerce, writing good product copy is both an art and a science. Not only does product copy need to be written in the correct tone and style for your...
Product matching or data matching is a computational technique employing Natural Language Processing and machine learning which aims to identify identical products being sold on different websites, where product names...
Assigning products to the right categories is crucial to allowing customers to find what they’re looking for, so product classification models are commonly used by online marketplaces to ensure that...
There’s often a lot of repetition in many data science projects. In tasks that utilise Natural Language Processing (or NLP), for example, you’ll always need to preprocess your text to...
I love sarcasm, but unfortunately I have a shaky ability to easily detect it in the voices of others, an aptitude for misinterpreting serious comments for sarcasm and then inappropriately...
Long before Donald Trump erroneously applied it to mean “news that he didn’t agree with”, the term “fake news” referred to disinformation and misleading editorial content. In recent years, it’s...
When you’re building a Natural Language Processing model, it’s the text annotation process which is the most laborious and the most expensive for your business. While you can use tools...
Whether you’re performing product attribute extraction, named entity recognition, product matching, product categorisation, review sentiment analysis, or you are sorting and prioritising customer support tickets, NLP models can be extremely...
While some people might naively interpret it as negativity, I think one of the best ways you can improve an ecommerce business is to focus on the stuff you’re not...
Product attributes, such as size, weight, wattage, or colour, are critical in ecommerce as they help customers find and select the right product for their needs. However, obtaining, adding, and...
Hugging Face Transformers are a collection of State-of-the-Art (SOTA) natural language processing models produced by the Hugging Face group. Basically, Hugging Face take the latest models covered in current natural...
Sentiment analysis, or opinion mining, is a form of emotion AI and uses natural language processing and computational linguistics to analyse text and infer the sentiment. Sentiment analysis has loads...