Over the past year or so, the Optuna package has quickly become a favourite among data scientists for hyperparameter tuning on machine learning models, and for good reason. It’s lightweight, easy-to-use, very efficient for optimising hyperparameters, and it’s much faster than other tools like GridSearchCV.
Unlike GridSearchCV, Optuna doesn’t require you to specify a grid of hyperparameter values to search over. Instead, you specify the range of values for each hyperparameter, and Optuna will search over that range to find the optimal values. This makes it much more efficient than GridSearchCV, which can take a long time to run if you have a large number of hyperparameters to tune.
In this article, we’ll look at how to use Optuna for XGBoost hyperparameter tuning by tuning model parameters on an XGBClassifier model.
To get started, open a Jupyter notebook and use Pip to install the Optuna package, and XGBoost, if you don’t have them installed already.
!pip3 install optuna
Optuna currently throws some warnings about forthcoming deprecations, so we’ll hide them for now, as they don’t affect the functionality of the package.
import warnings
warnings.filterwarnings('ignore')
Next, import the packages. We’ll be using the XGBoost classifier, and the Optuna package, plus some scikit-learn packages for model evaluation. You can use any dataset you like, but for simplicity I’m using the wine classification dataset from scikit-learn, as it will allow us to skip out the data preprocessing step.
from xgboost import XGBClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report
from sklearn.datasets import load_wine
import optuna
Load the data, and split it into X and y variables using the return_X_y
parameter. Set as_frame=True
to return the data as a Pandas dataframe, which will make it easier to work with.
X, y = load_wine(return_X_y=True, as_frame=True)
Now split the data into training and test sets using the train_test_split
function from scikit-learn. We’ll set the random_state
parameter to 1, so that we get the same results every time we run the code, and we’ll allocate 30% of the data to the test set using test_size=0.3
.
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1)
To see what sort of scores we can achieve with the XGBClassifier model from XGBoost we will first fit a simple base model. We’ll set the label_encoder
to False
and we’ll set the eval_metric
to mlogloss
to get the log loss score.
model = XGBClassifier(use_label_encoder=False,
eval_metric='mlogloss')
model.fit(X_train, y_train)
Now the base model has been trained, we can make predictions on the test set and evaluate the model.
y_pred = model.predict(X_test)
We’ll use the accuracy_score
function from scikit-learn to get the accuracy score, and the classification_report
function to get the precision, recall and F1 scores. We get back an accuracy score of 96.30%, which is pretty good.
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy: %.2f%%" % (accuracy * 100.0))
Accuracy: 96.30%
print(classification_report(y_test, y_pred))
precision recall f1-score support
0 0.92 1.00 0.96 23
1 1.00 0.89 0.94 19
2 1.00 1.00 1.00 12
accuracy 0.96 54
macro avg 0.97 0.96 0.97 54
weighted avg 0.97 0.96 0.96 54
Next, we’ll use Optuna to tune the hyperparameters of the XGBoost model. We’ll start by creating an objective function, which will be passed to the study.optimize
function. The objective function will take the trial
parameter, which is an instance of the Trial
class, and will return the accuracy score.
def objective(trial):
"""Define the objective function"""
params = {
'max_depth': trial.suggest_int('max_depth', 1, 9),
'learning_rate': trial.suggest_loguniform('learning_rate', 0.01, 1.0),
'n_estimators': trial.suggest_int('n_estimators', 50, 500),
'min_child_weight': trial.suggest_int('min_child_weight', 1, 10),
'gamma': trial.suggest_loguniform('gamma', 1e-8, 1.0),
'subsample': trial.suggest_loguniform('subsample', 0.01, 1.0),
'colsample_bytree': trial.suggest_loguniform('colsample_bytree', 0.01, 1.0),
'reg_alpha': trial.suggest_loguniform('reg_alpha', 1e-8, 1.0),
'reg_lambda': trial.suggest_loguniform('reg_lambda', 1e-8, 1.0),
'eval_metric': 'mlogloss',
'use_label_encoder': False
}
# Fit the model
optuna_model = XGBClassifier(**params)
optuna_model.fit(X_train, y_train)
# Make predictions
y_pred = optuna_model.predict(X_test)
# Evaluate predictions
accuracy = accuracy_score(y_test, y_pred)
return accuracy
Next we need to define an Optuna study. We’ll set the direction
to maximize
, as we want to maximize the accuracy score.
study = optuna.create_study(direction='maximize')
[I 2022-09-28 19:54:56,765] A new study created in memory with name: no-name-b8402a5f-29e1-44a1-9fd8-183717d58b3f
Finally, we can run the objective function using the study.optimize
function. We’ll set the n_trials
parameter to 100, which means that Optuna will run the objective function 100 times, and will try to find the best hyperparameters.
If you’ve ever used GridSearchCV for hyperparameter tuning, you’ll know that it can take a long time to run, especially if you have a large number of hyperparameters to tune. Optuna is much faster, as it uses Bayesian optimization to find the best hyperparameters.
study.optimize(objective, n_trials=100)
Depending on the speed of your data science workstation, the hyperparameter tuning should be complete in a minute or so. It will take much longer on a larger dataset, or if you define more hyperparameters to tune.
Now we can print the best parameters, and the best accuracy score achieved during the study trials.
print('Number of finished trials: {}'.format(len(study.trials)))
print('Best trial:')
trial = study.best_trial
print(' Value: {}'.format(trial.value))
print(' Params: ')
for key, value in trial.params.items():
print(' {}: {}'.format(key, value))
Number of finished trials: 100
Best trial:
Value: 1.0
Params:
max_depth: 6
learning_rate: 0.2236810727625855
n_estimators: 50
min_child_weight: 5
gamma: 0.10770487614463455
subsample: 0.29658342065443705
colsample_bytree: 0.08778804479025275
reg_alpha: 0.05958436598353962
reg_lambda: 0.1439741099392137
Now we’ll save teh best parameters to a dictionary, and we’ll use the XGBClassifier
function to create a new model with the best parameters.
params = trial.params
model = XGBClassifier(**params)
model.fit(X_train, y_train)
Once the model has been retrained with the best hyperparameters, we can make predictions on the test set and evaluate the model.
y_pred = model.predict(X_test)
We’ll use the same evaluation techniques as before, assessing performance with the accuracy_score
function, and the classification_report
function. The results show that we now get an accuracy score of 100%, which is perfect.
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy after tuning: %.2f%%" % (accuracy * 100.0))
Accuracy after tuning: 100.00%
print(classification_report(y_test, y_pred))
precision recall f1-score support
0 1.00 1.00 1.00 23
1 1.00 1.00 1.00 19
2 1.00 1.00 1.00 12
accuracy 1.00 54
macro avg 1.00 1.00 1.00 54
weighted avg 1.00 1.00 1.00 54
Matt Clarke, Tuesday, September 27, 2022