How to tune a CatBoostClassifier model with Optuna

Picture by Cong H, Pexels.

12 minutes to read

The CatBoost model is a gradient boosting model that is based on decision trees, much like XGBoost, LightGBM, and other tree-based models. It is a very popular model for tabular data, and is often used in Kaggle competitions. It is also very fast, and can be used for real-time predictions.

In this tutorial I’ll provide example code so you can train a CatBoostClassifier model and then use Optuna to optimize the hyperparameters. Optuna is a powerful package for hyperparameter optimization, and it is very easy to use and significantly quicker than GridSearchCV or RandomizedSearchCV.

!pip3 install catboost
!pip3 install optuna

Load the packages

For this tutorial we’ll be using the CatBoostClassifier model from CatBoost, the Optuna package for hyperparametemr optimization, and the Pickle package to save our trained model. To evaluate the performance of our classifier we’ll use the accuracy_score and classification_report modules from scikit-learn.

from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.metrics import classification_report
from sklearn.datasets import load_wine
import optuna
from optuna.samplers import TPESampler
import catboost
import pickle

Load the data

To keep things simple and allow us to focus on the task of training and tuning the CatBoost classifier, we’ll use the wine dataset from sklearn. This dataset contains 13 features and 3 classes. The goal is to predict the class of a wine based on its features. We’ll use the load_wine() function to load the data and will get this to return a Pandas dataframe.

X, y = load_wine(return_X_y=True, as_frame=True)
X.sample(5)

	alcohol	malic_acid	ash	alcalinity_of_ash	magnesium	total_phenols	flavanoids	nonflavanoid_phenols	proanthocyanins	color_intensity	hue	od280/od315_of_diluted_wines	proline
33	13.76	1.53	2.70	19.5	132.0	2.95	2.74	0.50	1.35	5.40	1.25	3.00	1235.0
73	12.99	1.67	2.60	30.0	139.0	3.30	2.89	0.21	1.96	3.35	1.31	3.50	985.0
29	14.02	1.68	2.21	16.0	96.0	2.65	2.33	0.26	1.98	4.70	1.04	3.59	1035.0
48	14.10	2.02	2.40	18.8	103.0	2.75	2.92	0.32	2.38	6.20	1.07	2.75	1060.0
166	13.45	3.70	2.60	23.0	111.0	1.70	0.92	0.43	1.46	10.68	0.85	1.56	695.0

Examine the target variable

If you use the Pandas value_counts() function on the target variable y, you’ll see that this dataset has three classes. These are not balanced, but this won’t be a massive problem for CatBoost.

y.value_counts()

  71
  59
  48
Name: target, dtype: int64

Split the data into training and test sets

Next we’ll split the data into training and test sets. We’ll use 70% of the data for training and 30% for testing by setting the test_size parameter to 0.3. The random_state parameter is set to 1 to ensure reproducibility of the results. If you miss this part, you could get a different split each time you run the function.

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1)

Create the CatBoostClassifier model

Now we have our dataset sorted, we can create and train a CatBoostClassifier model. This will be a simple base model with no hyperparameter tuning. We’ll define the model, then fit it to the training data. It should train quickly as this dataset is very small. Once that’s done, we can generate some predictions from the test data.

model = catboost.CatBoostClassifier(verbose=False)
model.fit(X_train, y_train)

<catboost.core.CatBoostClassifier at 0x7f4bbbab73d0>

y_pred = model.predict(X_test)

Evaluate the model

There are a couple of scikit-learn functions we can use to evaluate the model. The first is the accuracy_score function, which returns the accuracy of the model. The second is the classification_report function, which returns a report with the precision, recall, and F1 score for each class. As you can see, the base CatBoostClassifier is actually pretty decent even before hyperparameter tuning.

print(classification_report(y_test, y_pred))

              precision    recall  f1-score   support

           0       0.96      1.00      0.98        23
           1       1.00      0.95      0.97        19
           2       1.00      1.00      1.00        12

    accuracy                           0.98        54
   macro avg       0.99      0.98      0.98        54
weighted avg       0.98      0.98      0.98        54

print(accuracy_score(y_test, y_pred))

0.9814814814814815

Use Optuna to find the best hyperparameters

To try to eek extra performance out of our model and improve its accuracy we’ll now use the Optuna hyperparameter tuning library to find the best hyperparameters for our model. To get started, the first thing we need to do is create a custom objective function designed specifically for our CatBoostClassifier model.

This function will take in the hyperparameters we want to tune and return the accuracy of the model with those hyperparameters. We’ll then use Optuna to find the best hyperparameters for our model by running this function many times with different hyperparameter values.

def objective(trial):
    model = catboost.CatBoostClassifier(
        iterations=trial.suggest_int("iterations", 100, 1000),
        learning_rate=trial.suggest_float("learning_rate", 1e-3, 1e-1, log=True),
        depth=trial.suggest_int("depth", 4, 10),
        l2_leaf_reg=trial.suggest_float("l2_leaf_reg", 1e-8, 100.0, log=True),
        bootstrap_type=trial.suggest_categorical("bootstrap_type", ["Bayesian"]),
        random_strength=trial.suggest_float("random_strength", 1e-8, 10.0, log=True),
        bagging_temperature=trial.suggest_float("bagging_temperature", 0.0, 10.0),
        od_type=trial.suggest_categorical("od_type", ["IncToDec", "Iter"]),
        od_wait=trial.suggest_int("od_wait", 10, 50),
        verbose=False
    )
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    return accuracy_score(y_test, y_pred)

Create the study

Next we need to create an Optuna study using our objective function. We’ll use the TPE sampler, which is a good default for most problems. This uses the Tree-structured Parzen Estimator to sample the hyperparameter space. We’ll also set the direction to maximize, since we want to maximise the accuracy score. We’ll set it to run through 100 different trials. To avoid getting a message every time a trial runs, I’ve turned off verbose mode in Optuna by manually overriding the verbosity of the logging.

optuna.logging.set_verbosity(optuna.logging.WARNING)

sampler = TPESampler(seed=1)
study = optuna.create_study(study_name="catboost", direction="maximize", sampler=sampler)
study.optimize(objective, n_trials=100)

Evaluate the trial

After a couple of minutes, depending on the speed of your workstation, Optuna should have crunched through the trials and tried the hyperparameters that you specified. We can access the data from the study to find out which hyperparameters performed best.

print("Number of finished trials: ", len(study.trials))
print("Best trial:")
trial = study.best_trial
print("  Value: ", trial.value)
print("  Params: ")
for key, value in trial.params.items():
    print("    {}: {}".format(key, value))

Number of finished trials:  100
Best trial:
  Value:  1.0
  Params: 
    iterations: 503
    learning_rate: 0.06564339077069614
    depth: 6
    l2_leaf_reg: 7.546635702360232e-06
    bootstrap_type: Bayesian
    random_strength: 1.4799844388224288e-07
    bagging_temperature: 0.19366957870297075
    od_type: IncToDec
    od_wait: 20

Create the model with the best hyperparameters

Now that Optuna has identified the optimum combination of hyperparamters to tune our CatBoostClassifier, we can create a new model with these hyperparameters and train it on the entire dataset. We can pass in **trial.params to the model to pass in the hyperparameters that Optuna identified as being the best.

model = catboost.CatBoostClassifier(**trial.params, verbose=False)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

Evaluate the model

Finally, we can evaluate the model on the test set and see how well it performs. The base model was already pretty solid, but hyperparameter tuning has given us a further boost and we’re now hitting 100% accuracy on the test set. This is a great result, and we can be confident that our model will perform well on new data.

print(classification_report(y_test, y_pred))

              precision    recall  f1-score   support

           0       1.00      1.00      1.00        23
           1       1.00      1.00      1.00        19
           2       1.00      1.00      1.00        12

    accuracy                           1.00        54
   macro avg       1.00      1.00      1.00        54
weighted avg       1.00      1.00      1.00        54

print(accuracy_score(y_test, y_pred))

1.0

Save the model using Pickle

Since we’ve now got a perfectly optimised machine learning model that works well on data it’s never seen, and that’s been tuned to our specific dataset, we can save it for future use. We’ll use Pickle to save the ML model to disk. This will allow us to load the model at a later date and use it to make predictions on new data without the hassle of retraining or reoptimising it.

pickle.dump(model, open("catboost_model.pkl", "wb"))

Matt Clarke, Friday, October 14, 2022

Matt Clarke Matt is an Ecommerce and Marketing Director who uses data science to help in his work. Matt has a Master's degree in Internet Retailing (plus two other Master's degrees in different fields) and specialises in the technical side of ecommerce and marketing.