问题描述
我正在尝试通过在XGBoost上使用scikit-learn的GridSearchCV进行超参数表搜索。在进行网格搜索时,我希望它早点停止,因为它可以大大减少搜索时间,并且(预期)在我的预测/回归任务中有更好的结果。我正在通过Scikit-Learn API使用XGBoost。
i am trying to do hyperparemeter search with using scikit-learn's GridSearchCV on XGBoost. During gridsearch i'd like it to early stop, since it reduce search time drastically and (expecting to) have better results on my prediction/regression task. I am using XGBoost via its Scikit-Learn API.
model = xgb.XGBRegressor()
GridSearchCV(model, paramGrid, verbose=verbose ,fit_params={'early_stopping_rounds':42}, cv=TimeSeriesSplit(n_splits=cv).get_n_splits([trainX, trainY]), n_jobs=n_jobs, iid=iid).fit(trainX,trainY)
我尝试使用fit_params给出早期停止参数,但是随后抛出此错误,主要是因为缺少早期停止所需的验证集:
I tried to give early stopping parameters with using fit_params, but then it throws this error which is basically because of lack of validation set which is required for early stopping:
/opt/anaconda/anaconda3/lib/python3.5/site-packages/xgboost/callback.py in callback(env=XGBoostCallbackEnv(model=<xgboost.core.Booster o...teration=4000, rank=0, evaluation_result_list=[]))
187 else:
188 assert env.cvfolds is not None
189
190 def callback(env):
191 """internal function"""
--> 192 score = env.evaluation_result_list[-1][1]
score = undefined
env.evaluation_result_list = []
193 if len(state) == 0:
194 init(env)
195 best_score = state['best_score']
196 best_iteration = state['best_iteration']
如何使用early_stopping_rounds在XGBoost上应用GridSearch?
How can i apply GridSearch on XGBoost with using early_stopping_rounds?
注意:模型在没有gridsearch的情况下也有效,在没有'fit_params = { 'early_stopping_rounds':42}
note: model is working without gridsearch, also GridSearch works without 'fit_params={'early_stopping_rounds':42}
推荐答案
对@glao答案的更新和对@Vasim的评论/问题的答复,如sklearn 0.21.3(请注意, fit_params
已从 GridSearchCV
的实例化中移出,并已移至 fit()
方法;同样,导入专门从xgboost提取sklearn包装器模块):
An update to @glao's answer and a response to @Vasim's comment/question, as of sklearn 0.21.3 (note that fit_params
has been moved out of the instantiation of GridSearchCV
and been moved into the fit()
method; also, the import specifically pulls in the sklearn wrapper module from xgboost):
import xgboost.sklearn as xgb
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import TimeSeriesSplit
cv = 2
trainX= [[1], [2], [3], [4], [5]]
trainY = [1, 2, 3, 4, 5]
# these are the evaluation sets
testX = trainX
testY = trainY
paramGrid = {"subsample" : [0.5, 0.8]}
fit_params={"early_stopping_rounds":42,
"eval_metric" : "mae",
"eval_set" : [[testX, testY]]}
model = xgb.XGBRegressor()
gridsearch = GridSearchCV(model, paramGrid, verbose=1,
cv=TimeSeriesSplit(n_splits=cv).get_n_splits([trainX, trainY]))
gridsearch.fit(trainX, trainY, **fit_params)
这篇关于GridSearchCV-XGBoost-提前停止的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!