问题描述
我想使用xgboost cv函数为我的训练数据集找到最佳参数。我对api感到困惑。如何找到最佳参数?这类似于sklearn grid_search
交叉验证功能吗?如何找到确定 max_depth
参数([2,4,6])的哪个选项最佳?
I would like to use the xgboost cv function to find the best parameters for my training data set. I am confused by the api. How do I find the best parameter? Is this similar to the sklearn grid_search
cross-validation function? How can I find which of the options for the max_depth
parameter ([2,4,6]) was determined optimal?
from sklearn.datasets import load_iris
import xgboost as xgb
iris = load_iris()
DTrain = xgb.DMatrix(iris.data, iris.target)
x_parameters = {"max_depth":[2,4,6]}
xgb.cv(x_parameters, DTrain)
...
Out[6]:
test-rmse-mean test-rmse-std train-rmse-mean train-rmse-std
0 0.888435 0.059403 0.888052 0.022942
1 0.854170 0.053118 0.851958 0.017982
2 0.837200 0.046986 0.833532 0.015613
3 0.829001 0.041960 0.824270 0.014501
4 0.825132 0.038176 0.819654 0.013975
5 0.823357 0.035454 0.817363 0.013722
6 0.822580 0.033540 0.816229 0.013598
7 0.822265 0.032209 0.815667 0.013538
8 0.822158 0.031287 0.815390 0.013508
9 0.822140 0.030647 0.815252 0.013494
推荐答案
对具有不同参数的模型进行评估,以找到这些参数的最佳组合。
Grid-search evaluates a model with varying parameters to find the best possible combination of these.
sklearn 讨论了很多有关CV的内容,它们可以组合使用,但是它们各自具有非常不同的功能
The sklearn docs talks a lot about CV, and they can be used in combination, but they each have very different purposes.
您也许可以将xgboost放入sklearn的gridsearch功能中。签出xgboost的sklearn界面,以实现最流畅的应用。
You might be able to fit xgboost into sklearn's gridsearch functionality. Check out the sklearn interface to xgboost for the most smooth application.
这篇关于了解python xgboost cv的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!