XGBoost

Articles tagged XGBoost

How to avoid model overfitting with early stopping rounds

One issue with the more sophisticated algorithms, such as Extreme Gradient Boosting, is that they can overfit to the data. This basically means that the model picks up the idiosyncrasies...

How to engineer new features using Decision Tree models

One interesting technique in feature engineering is the use of Decision Trees (and other models) to create or derive new features using combinations of features from the original dataset. Here,...

How to create a classification model using XGBoost in Python

The XGBoost or Extreme Gradient Boosting algorithm is a decision tree based machine learning algorithm which uses a process called boosting to help improve performance. Since it’s introduction, it’s become...

How to use the Isolation Forest model for outlier detection

Outliers, or anomalies, can impact the accuracy of both regression and classification models, so detecting and removing them is an important step in the machine learning process. On larger datasets,...

How to use bagging, boosting, and stacking in ensembles

Ensemble models combine the predicitions of several different models to produce a single prediction, often with better results than can be achieved with a single model alone. There are several...

How to create a product matching model using XGBoost

Product matching or data matching is a computational technique employing Natural Language Processing and machine learning which aims to identify identical products being sold on different websites, where product names...

How to detect fake news with machine learning

Long before Donald Trump erroneously applied it to mean “news that he didn’t agree with”, the term “fake news” referred to disinformation and misleading editorial content. In recent years, it’s...

How to use SMOTE for imbalanced classification

Imbalanced classification problems, such as the detection of fraudulent card payments, represent a significant challenge for machine learning models. When the target class, such as fraudulent transactions, makes up such...

How to use model selection and hyperparameter tuning

There are many techniques you can apply to improve the performance of your machine learning models, but two of the most powerful are model selection and hyperparameter tuning. As models...

How to use transform categorical variables using encoders

There are loads of different ways to convert categorical variables into numeric features so they can be used within machine learning models. While you can perform this process manually on...

How to save and load machine learning models using Pickle

Machine learning models often take hours or days to run, especially on large datasets with many features. If your machine goes off, you’ll lose your model and you’ll need to...

How to tune model hyper-parameters with grid search

Although scikit-learn’s machine learning estimator models can be used out-of-the-box with no tuning, you can usually generate further improvements with a little of tweaking. Each estimator class accepts arguments called...

How to use your GPU to accelerate XGBoost models

If you’re not fortunate enough to have a really powerful data science workstation for your work, one of the problems you’ll likely face is that your models can take quite...

How to use mean encoding in your machine learning models

When you’re building a machine learning model, the feature engineering step is often the most important. From your initial small batch of features, the clever use of maths and stats...

How to interpret the confusion matrix

As a practical demonstration of how the confusion matrix works, lets load up the Wisconsin Breast Cancer dataset, create a classification model and examine the confusion matrix to see how...