python - python中的KFold到底做了什么？

我正在看本教程：https://www.dataquest.io/mission/74/getting-started-with-kaggle
我到了第九部分，做预测。其中有一个名为“泰坦尼克号”的数据框架中的一些数据，然后使用以下方法将其分成若干个折叠部分：

# Generate cross validation folds for the titanic dataset.  It return the row indices corresponding to train and test.
# We set random_state to ensure we get the same splits every time we run this.
kf = KFold(titanic.shape[0], n_folds=3, random_state=1)

我不知道它到底在做什么，它是什么样的物体。我试着阅读文档，但没有多大帮助。另外，这条线路有三个褶皱（n_褶皱=3），为什么以后只能进入列车和测试（我怎么知道它们被称为列车和测试）？

for train, test in kf:

最佳答案

KFOLD将提供列车/测试指标，以将数据拆分到列车和测试集中。它将把数据集拆分为连续的折叠（默认情况下不进行洗牌），然后每个折叠使用一次验证集，而剩余的折叠则形成训练集（source）。
例如，您有一些从1到10的数据索引。如果您使用k，在第一次迭代中，您将得到作为测试指数的k - 1，而剩余的n_fold=k折叠（不包括该i）一起作为列车指数。
一个例子

import numpy as np
from sklearn.cross_validation import KFold

x = [1,2,3,4,5,6,7,8,9,10,11,12]
kf = KFold(12, n_folds=3)

for train_index, test_index in kf:
    print (train_index, test_index)

产量
折叠1:[4 5 6 7 8 9 10 11][0 1 2 3]
折叠2:[0 1 2 3 8 9 10 11][4 5 6 7]
折叠3:[0 1 2 3 4 5 6 7][8 9 10 11]
导入sklearn 0.20的更新：
在版本0.20中，Kfold对象被移动到(i<=k)模块。要在sklearn 0.20+中导入kfold，请使用(k-1)。Kfold current documentationsource

关于python - python中的KFold到底做了什么？，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/36063014/