问题描述
这是我在CatBoost中应用BayesSearch的尝试:
This is my attempt at applying BayesSearch in CatBoost:
from catboost import CatBoostClassifier
from skopt import BayesSearchCV
from sklearn.model_selection import StratifiedKFold
# Classifier
bayes_cv_tuner = BayesSearchCV(
estimator = CatBoostClassifier(
silent=True
),
search_spaces = {
'depth':(2,16),
'l2_leaf_reg':(1, 500),
'bagging_temperature':(1e-9, 1000, 'log-uniform'),
'border_count':(1,255),
'rsm':(0.01, 1.0, 'uniform'),
'random_strength':(1e-9, 10, 'log-uniform'),
'scale_pos_weight':(0.01, 1.0, 'uniform'),
},
scoring = 'roc_auc',
cv = StratifiedKFold(
n_splits=2,
shuffle=True,
random_state=72
),
n_jobs = 1,
n_iter = 100,
verbose = 1,
refit = True,
random_state = 72
)
跟踪结果:
def status_print(optim_result):
"""Status callback durring bayesian hyperparameter search"""
# Get all the models tested so far in DataFrame format
all_models = pd.DataFrame(bayes_cv_tuner.cv_results_)
# Get current parameters and the best parameters
best_params = pd.Series(bayes_cv_tuner.best_params_)
print('Model #{}\nBest ROC-AUC: {}\nBest params: {}\n'.format(
len(all_models),
np.round(bayes_cv_tuner.best_score_, 4),
bayes_cv_tuner.best_params_
))
Fit BayesCV
Fit BayesCV
resultCAT = bayes_cv_tuner.fit(X_train, y_train, callback=status_print)
结果
前3个迭代工作正常,但随后我得到一个不间断的字符串:
The first 3 iterations work fine, but then I get a nonstop string of:
Iteration with suspicious time 7.55 sec ignored in overall statistics.
Iteration with suspicious time 739 sec ignored in overall statistics.
(...)
关于我哪里出了问题的任何想法/如何改善这一点?
Any ideas of where I went wrong/How can I improve this?
沙鲁特
推荐答案
根据CatBoost到目前为止所记录的时间安排,skopt正在安排的一组实验迭代实际上花费的时间太长.
One of the iterations in the set of experiments skopt is arranging is actually taking too long to complete, based on the timings that CatBoost has up so far recorded.
如果通过设置分类器的详细程度来探究何时发生这种情况,并使用回调来探究skopt正在探究的参数组合,您可能会发现罪魁祸首最有可能是深度参数:当CatBoost出现时,Skopt会变慢正在尝试测试更深的树木.
If you explore when this happens by setting the verbosity of the classifier and you use a callback to explore what combination of parameters skopt is exploring, you may find that the culprit is most likely the depth parameters: Skopt will slow down when CatBoost is trying to test deeper trees.
您也可以尝试使用此自定义回调进行调试:
You can try to debug too using this custom callback:
counter = 0
def onstep(res):
global counter
args = res.x
x0 = res.x_iters
y0 = res.func_vals
print('Last eval: ', x0[-1],
' - Score ', y0[-1])
print('Current iter: ', counter,
' - Score ', res.fun,
' - Args: ', args)
joblib.dump((x0, y0), 'checkpoint.pkl')
counter = counter+1
您可以通过以下方式调用它:
You can call it by:
resultCAT = bayes_cv_tuner.fit(X_train, y_train, callback=[onstep, status_print])
实际上,我在实验中注意到了与您相同的问题,随着深度的增加,复杂度以非线性方式增加,因此CatBoost需要更长的时间来完成其迭代.一个简单的解决方案是尝试搜索一个更简单的空间:
Actually I've noticed the same problem as yours in my experiments, the complexity raises in a non-linear way as the depth increases and thus CatBoost takes longer time to complete its iterations. A simple solution is to try searching a simpler space:
'depth':(2, 8)
通常深度8就足够了,无论如何,您可以先以最大深度等于8的速度运行skopt,然后通过增加最大深度来重新进行迭代.
Usually depth 8 is enough, anyway, you can first run skopt with maximum depth equal to 8 and then re-iterate by increasing the maximum.
这篇关于贝叶斯优化在CatBoost中的应用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!