问题描述
我正在处理数据,其中每个患者都可以有不同数量的训练示例.运行 Xgboost CV 时,我想确保来自同一患者的数据仅限出现在同一折叠中,因此我需要使用折叠,其中可能有不同数量的索引.
I'm working with the data, where every patient can have different number of training examples. When running Xgboost CV I want to make sure that data from same patient is restricted to be present in same fold only, thus I need to use folds, which may have different number of indices in it.
在 xgb.cv 函数中使用 'fold' 参数传递包含索引的 numpy 数组列表时,我得到:
At the moment when passing list of numpy arrays containing indices using 'fold' parameter in xgb.cv function I get:
dtrain = dall.slice(np.concatenate([idset[i] for i in range(nfold) if k != i]))ValueError:无法连接零维数组
通过将我的自定义折叠作为列表传递,其中每个元素都是测试折叠索引的向量,我在 R 中实现了相同的过程,没有任何问题.
I have implemented same procedure in R with no problems by passing my custom folds as list where each element is a vector of test fold's indices.
您能否建议将自定义索引传递给 Python XGBoost CV 函数的正确方法是什么.谢谢!
Could you please advice what is the proper way to pass custom indices to Python XGBoost CV function. Thanks!
推荐答案
这已经过时了,但当我遇到类似问题时,我在谷歌搜索上找到了答案.
This is old but I am putting down an answer as it came up for me on google search, when I was having a similar problem.
我想将 TimeSeriesSplit 与 xgboost cv 一起使用,但无法直接使用,因为 folds 参数需要 KFold 或 StratifiedKFold,但是,您可以将自己的索引列表作为元组列表提供,如下所示
I wanted to use TimeSeriesSplit with xgboost cv but couldn't do it directly as the folds parameter expects KFold or StratifiedKFold, however, you can give your own list of indices as a list of tuples as shown below
train1 = [0, 1, 2, 3, 4]
test1 = [4, 5, 6, 7, 8]
train2 = [9 ,10 ,11 ,12 ,13]
test2 = [14, 15, 16, 17, 18]
train3= [19, 20, 21, 22, 23, 24]
test3 = [25, 26, 27, 28, 29, 30]
tsFolds = [(train1, test1), (train2, test2), (train3, test3)]
xgbCV = xgb.cv(
params = parameters,
dtrain = trainDMat,
num_boost_round = num_boost_round,
nfold = len(tsFolds),
folds = tsFolds,
metrics = {'rmse'},
early_stopping_rounds = early_stopping_rounds,
verbose_eval = True,
seed = seed
)
这篇关于带有自定义折叠 python 的 xgboost CV的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!