问题描述
我正在使用XGBoost cv为我的模型找到最佳的回合数.如果有人可以确认(或反驳),我将不胜感激,最佳的回合数是:
I am using XGBoost cv to find the optimal number of rounds for my model. I would be very grateful if someone could confirm (or refute), the optimal number of rounds is:
estop = 40
res = xgb.cv(params, dvisibletrain, num_boost_round=1000000000, nfold=5, early_stopping_rounds=estop, seed=SEED, stratified=True)
best_nrounds = res.shape[0] - estop
best_nrounds = int(best_nrounds / 0.8)
即:完成的回合总数为res.shape [0],因此要获得最优的回合数,我们要减去早期停止的回合数.
i.e: the total number of rounds completed is res.shape[0], so to get the optimal number of rounds, we subtract the number of early stopping rounds.
然后,我们根据用于验证的分数来扩大轮数. 正确吗?
Then, we scale up the number of rounds, based on the fraction used for validation. Is that correct?
推荐答案
是的,如果您执行best_nrounds = int(best_nrounds / 0.8)
时认为您的验证集是整个训练数据的20%,则听起来是正确的(换句话说,进行了5次交叉验证).
Yep, it sounds correct if when you do best_nrounds = int(best_nrounds / 0.8)
you consider that your validation set was 20% of your whole training data (another way of saying that you performed a 5-fold cross-validation).
该规则可以概括为:
n_folds = 5
best_nrounds = int((res.shape[0] - estop) / (1 - 1 / n_folds))
或者,如果您不进行简历,而是一次验证:
Or if you don't perform CV but a single validation:
validation_slice = 0.2
best_nrounds = int((res.shape[0] - estop) / (1 - validation_slice))
您可以在此处查看此规则的示例在Kaggle上(请参阅评论).
You can see an example of this rule being applied here on Kaggle (see the comments).
这篇关于XGBoost CV和最佳迭代的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!