How to tune a LightGBMClassifier model with Optuna

Learn how to create and tune a classification model using the LightGBM LightGBMClassifier and tune its hyperparameters using Optuna.

How to tune a LightGBMClassifier model with Optuna
Picture by 祝 鹤槐, Pexels.
14 minutes to read

The LightGBM model is a gradient boosting framework that uses tree-based learning algorithms, much like the popular XGBoost model. LightGBM supports both classification and regression tasks, and is known for its high speed and accuracy. LightGBM was originally developed by Microsoft and is now an open source project. It is often used in machine learning competitions, and is a popular choice for Kaggle users.

LightGBM has lots of advantages over other gradient boosting frameworks. It’s fast, scalable, and has a lower memory usage than XGBoost. It also has a higher accuracy than other frameworks, and is able to handle large datasets. Like XGBoost, it also supports parallel and GPU learning, making it blazingly fast if you’ve got a powerful GPU.

In this post, we will use the LightGBM model to create a classification model and tune its hyperparameters using Optuna.

Install the packages

To get started, open a Jupyter notebook and install the LightGBM and Optuna packages from the Pip package management system. You can do this from within the notebook by putting an exclamation mark before the pip3 install command and then executing the code cell.

!pip3 install optuna
!pip3 install lightgbm

Load the packages

Next we’ll load the packages we need for this project. We’ll be using LightGBM for our model and Optuna for hyperparameter tuning. We’ll need the train_test_split module from scikit-learn to split our training and test data, and the accuracy_score and classification_report modules to evaluate our model. We’ll save our trained ML model using Pickle.

import lightgbm as lgb
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.metrics import classification_report
from sklearn.datasets import load_wine
import optuna
from optuna.samplers import TPESampler
import pickle

Load the data

Next, we’ll load our dataset. To keep things simple we’ll use the wine dataset built into scikit-learn, as we can then skip out some of the feature engineering and data cleansing tasks you’d undertake when building a model and focus on the model training and tuning. We’ll pass True to the return_X_y parameter to get back a X and y data and return this as a Pandas dataframe using as_frame=True.

X, y = load_wine(return_X_y=True, as_frame=True)
alcohol malic_acid ash alcalinity_of_ash magnesium total_phenols flavanoids nonflavanoid_phenols proanthocyanins color_intensity hue od280/od315_of_diluted_wines proline
84 11.84 0.89 2.58 18.0 94.0 2.20 2.21 0.22 2.35 3.05 0.79 3.08 520.0
132 12.81 2.31 2.40 24.0 98.0 1.15 1.09 0.27 0.83 5.70 0.66 1.36 560.0
58 13.72 1.43 2.50 16.7 108.0 3.40 3.67 0.19 2.04 6.80 0.89 2.87 1285.0
143 13.62 4.95 2.35 20.0 92.0 2.00 0.80 0.47 1.02 4.40 0.91 2.05 550.0
0 14.23 1.71 2.43 15.6 127.0 2.80 3.06 0.28 2.29 5.64 1.04 3.92 1065.0

Examine the target variable

If you use the value_counts() function to print the target variable values stored in y you’ll see that we have three classes. The classes are not balanced, but that’s not a problem for this experiment.

1    71
0    59
2    48
Name: target, dtype: int64

Split the data into training and test sets

Now we need to split up our data into training and test sets. We will use 70% of the data for training and 30% for testing by defining the test_size value as 0.3. We’ll also add a random_state value to ensure we get the same results each time we run the task.

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1)

Create the LightGBM classification model

Now we can use LightGBM to create a classification model via the LGBMClassifier class. We will use the default parameters for now as this is just a base model and we’re going to use Optuna to determine the optimal parameters for maximising the results on our dataset. Once we’ve defined the model, we’ll use fit() to train this on our training data.

base_model = lgb.LGBMClassifier(), y_train)

Make predictions

Now the LightGBM classification model has been trained, we can use it to make predictions on the test data. The predictions are stored in the variable y_pred, which is a Numpy array. If you print this you’ll see the class predicted for each row in the test dataset.

y_pred = base_model.predict(X_test)

array([2, 1, 0, 1, 0, 2, 1, 0, 2, 1, 0, 0, 1, 0, 1, 1, 2, 0, 1, 0, 0, 1,
       2, 0, 0, 2, 0, 0, 0, 2, 1, 2, 2, 0, 1, 1, 1, 1, 1, 0, 0, 1, 2, 0,
       0, 0, 1, 0, 0, 0, 1, 2, 2, 0])

Evaluate the model

To evaluate the performance of our classifier we’ll use two metrics from scikit-learn: accuracy and classification report. The accuracy is the number of correct predictions divided by the total number of predictions. The classification report provides a breakdown of each class by precision, recall, f1-score and support.

The scores we gain are already very good, with an accuracy of 98.148. However, we might be able to get further improvement by tuning the model’s hyperparameters using Optuna.

accuracy_score(y_test, y_pred)
print(classification_report(y_test, y_pred))

              precision    recall  f1-score   support

           0       0.96      1.00      0.98        23
           1       1.00      0.95      0.97        19
           2       1.00      1.00      1.00        12

    accuracy                           0.98        54
   macro avg       0.99      0.98      0.98        54
weighted avg       0.98      0.98      0.98        54

Use Optuna to tune the LightGBM model

To try to maximise the performance of our LightGBM classification model we’ll now tune the model’s hyperparameters. Hyperparameters are the model’s internal settings and making fine adjustments to them can yield greater accuracy and better overal results. We’ll use Optuna for our hyperparameter tuning as it’s significantly quicker than scikit-learn’s GridSearch tuning module and often generates better results.

To use Optuna you first need to create an objective function. This includes a dictionary of the model’s hyperparameters you want to test, as well as the ranges of values you want to cover during testing. Optuna will do a series of runs and test different combinations of hyperparameters by fitting them to your model and then measuring the accuracy (or whatever objective you set) before finally returning the best parameters.

def objective(trial):
    Objective function to be minimized.
    param = {
        "objective": "multiclass",
        "metric": "multi_logloss",
        "verbosity": -1,
        "boosting_type": "gbdt",
        "num_class": 3,
        "lambda_l1": trial.suggest_float("lambda_l1", 1e-8, 10.0, log=True),
        "lambda_l2": trial.suggest_float("lambda_l2", 1e-8, 10.0, log=True),
        "num_leaves": trial.suggest_int("num_leaves", 2, 256),
        "feature_fraction": trial.suggest_float("feature_fraction", 0.4, 1.0),
        "bagging_fraction": trial.suggest_float("bagging_fraction", 0.4, 1.0),
        "bagging_freq": trial.suggest_int("bagging_freq", 1, 7),
        "min_child_samples": trial.suggest_int("min_child_samples", 5, 100),
    gbm = lgb.LGBMClassifier(**param), y_train)
    preds = gbm.predict(X_test)
    accuracy = accuracy_score(y_test, preds)
    return accuracy

Run the Optuna study

To run the Optuna study and identify the best hyperparameters for our LightGBMClassifier model we need to create a sampler. We’re using TPESampler, which uses the Tree-Structured Parzen Estimator algorithm. We want to maximise the accuracy of our model during tuning, so we’ll pass in the maximize argument to create_study() along with our sampler. We’ll then use optimize() to run 100 trials against our objective function.

sampler = TPESampler(seed=1)
study = optuna.create_study(study_name="lightgbm", direction="maximize", sampler=sampler)
study.optimize(objective, n_trials=100)

Examine the Optuna study results

To examine the results of our Optuna study we can print some values returned in the study variable. We can see that we ran 100 trials and that trial number 14 generated the best results, with an accuracy of 1.0 or 100%. By looping over the trial.params.items() we can see what the winning hyperparameters were and use them in our final tuned model.

print('Best parameters:', study.best_params)
Best parameters: {'lambda_l1': 9.818554108154862, 'lambda_l2': 2.4055010791348247e-06, 'num_leaves': 4, 'feature_fraction': 0.5515741134287729, 'bagging_fraction': 0.6255538253881087, 'bagging_freq': 2, 'min_child_samples': 17}
print('Best value:', study.best_value)
Best value: 1.0
print('Best trial:', study.best_trial)
Best trial: FrozenTrial(number=14, values=[1.0], datetime_start=datetime.datetime(2022, 10, 14, 7, 7, 43, 224346), datetime_complete=datetime.datetime(2022, 10, 14, 7, 7, 43, 259048), params={'lambda_l1': 9.818554108154862, 'lambda_l2': 2.4055010791348247e-06, 'num_leaves': 4, 'feature_fraction': 0.5515741134287729, 'bagging_fraction': 0.6255538253881087, 'bagging_freq': 2, 'min_child_samples': 17}, distributions={'lambda_l1': FloatDistribution(high=10.0, log=True, low=1e-08, step=None), 'lambda_l2': FloatDistribution(high=10.0, log=True, low=1e-08, step=None), 'num_leaves': IntDistribution(high=256, log=False, low=2, step=1), 'feature_fraction': FloatDistribution(high=1.0, log=False, low=0.4, step=None), 'bagging_fraction': FloatDistribution(high=1.0, log=False, low=0.4, step=None), 'bagging_freq': IntDistribution(high=7, log=False, low=1, step=1), 'min_child_samples': IntDistribution(high=100, log=False, low=5, step=1)}, user_attrs={}, system_attrs={}, intermediate_values={}, trial_id=14, state=TrialState.COMPLETE, value=None)

Create the model using the best parameters

Finally, we can pass the best hyperparameters identified by Optuna back to LGBClassifier and fit our final model with the ideal parameters to maximise model accuracy. To do this, there’s no need to manually pass in a dictionary of params as you would do normally. Instead, you can simply pass in **study.best_params and it will provide this for you.

model = lgb.LGBMClassifier(**study.best_params), y_train)
[LightGBM] [Warning] lambda_l1 is set=9.818554108154862, reg_alpha=0.0 will be ignored. Current value: lambda_l1=9.818554108154862
[LightGBM] [Warning] bagging_fraction is set=0.6255538253881087, subsample=1.0 will be ignored. Current value: bagging_fraction=0.6255538253881087
[LightGBM] [Warning] lambda_l2 is set=2.4055010791348247e-06, reg_lambda=0.0 will be ignored. Current value: lambda_l2=2.4055010791348247e-06
[LightGBM] [Warning] feature_fraction is set=0.5515741134287729, colsample_bytree=1.0 will be ignored. Current value: feature_fraction=0.5515741134287729
[LightGBM] [Warning] bagging_freq is set=2, subsample_freq=0 will be ignored. Current value: bagging_freq=2

LGBMClassifier(bagging_fraction=0.6255538253881087, bagging_freq=2,
               feature_fraction=0.5515741134287729, lambda_l1=9.818554108154862,
               lambda_l2=2.4055010791348247e-06, min_child_samples=17,

Evaluate the tuned LightGBM model

Now that’s been trained, we can run the tuned model on our test data again and evaluate its performance using the accuracy score and the classification report. The Optuna hyperparameter tuning did the trick and our model now achieves perfect accuracy across all classes.

y_pred = model.predict(X_test)
accuracy_score(y_test, y_pred)
print(classification_report(y_test, y_pred))
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        23
           1       1.00      1.00      1.00        19
           2       1.00      1.00      1.00        12

    accuracy                           1.00        54
   macro avg       1.00      1.00      1.00        54
weighted avg       1.00      1.00      1.00        54

Save the model using Pickle

Finally, we’ll save the model using Pickle. Using Pickle to save the model means we can load it later and use it to make predictions on new data without the need to retrain it.

filename = "lightgbm.pkl"
pickle.dump(model, open(filename, "wb"))

Matt Clarke, Thursday, January 19, 2023

Matt Clarke Matt is an Ecommerce and Marketing Director who uses data science to help in his work. Matt has a Master's degree in Internet Retailing (plus two other Master's degrees in different fields) and specialises in the technical side of ecommerce and marketing.