问题描述
scikit-learn 中有绝对有用的类 GridSearchCV 来做网格搜索和交叉验证,但我不想做交叉验证.我想在没有交叉验证的情况下进行网格搜索并使用整个数据进行训练.更具体地说,我需要在网格搜索期间使用oob 分数"评估 RandomForestClassifier 制作的模型.有简单的方法吗?还是我应该自己上课?
There is absolutely helpful class GridSearchCV in scikit-learn to do grid search and cross validation, but I don't want to do cross validataion. I want to do grid search without cross validation and use whole data to train.To be more specific, I need to evaluate my model made by RandomForestClassifier with "oob score" during grid search.Is there easy way to do it? or should I make a class by myself?
重点是
- 我想用简单的方法进行网格搜索.
- 我不想进行交叉验证.
- 我需要使用整个数据进行训练.(不想分开训练数据和测试数据)
- 我需要在网格搜索期间使用 oob 分数进行评估.
推荐答案
我真的不建议使用 OOB 来评估模型,但了解如何在 GridSearchCV()(我经常这样做,所以我可以从最佳网格中保存 CV 预测,以便于模型堆叠).我认为最简单的方法是通过
ParameterGrid()
创建参数网格,然后循环遍历每组参数.例如,假设您有一个名为grid"的网格字典和名为rf"的 RF 模型对象,那么您可以执行以下操作:
I would really advise against using OOB to evaluate a model, but it is useful to know how to run a grid search outside of GridSearchCV()
(I frequently do this so I can save the CV predictions from the best grid for easy model stacking). I think the easiest way is to create your grid of parameters via ParameterGrid()
and then just loop through every set of params. For example assuming you have a grid dict, named "grid", and RF model object, named "rf", then you can do something like this:
for g in ParameterGrid(grid):
rf.set_params(**g)
rf.fit(X,y)
# save if best
if rf.oob_score_ > best_score:
best_score = rf.oob_score_
best_grid = g
print "OOB: %0.5f" % best_score
print "Grid:", best_grid
这篇关于在python中有没有交叉验证的简单网格搜索方法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!