本文介绍了KFold 和 ShuffleSplit CV 有什么区别?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

似乎每次迭代对象时 KFold 都会生成相同的值,而 Shuffle Split 每次都会生成不同的索引.这样对吗?如果是这样,两者之间的用途是什么?

It seems like KFold generates the same values every time the object is iterated over, while Shuffle Split generates different indices every time. Is this correct? If so, what are the uses for one over the other?

cv = cross_validation.KFold(10, n_folds=2,shuffle=True,random_state=None)
cv2 = cross_validation.ShuffleSplit(10,n_iter=2,test_size=0.5)
print(list(iter(cv)))
print(list(iter(cv)))
print(list(iter(cv2)))
print(list(iter(cv2)))

产生以下输出:

[(array([1, 3, 5, 8, 9]), array([0, 2, 4, 6, 7])), (array([0, 2, 4, 6, 7]), array([1, 3, 5, 8, 9]))]                                     
[(array([1, 3, 5, 8, 9]), array([0, 2, 4, 6, 7])), (array([0, 2, 4, 6, 7]), array([1, 3, 5, 8, 9]))]                                     
[(array([4, 6, 3, 2, 7]), array([8, 1, 9, 0, 5])), (array([3, 6, 7, 0, 5]), array([9, 1, 8, 4, 2]))]                                     
[(array([3, 0, 2, 1, 7]), array([5, 6, 9, 4, 8])), (array([0, 7, 1, 3, 8]), array([6, 2, 5, 4, 9]))]    

推荐答案

KFold 和 ShuffleSplit 输出的区别

KFold 会将您的数据集划分为预先指定的折叠次数,并且每个样本都必须是一次且只有一次.折叠是数据集的子集.

KFold will divide your data set into prespecified number of folds, and every sample must be in one and only one fold. A fold is a subset of your dataset.

ShuffleSplit 将在每次迭代期间随机采样整个数据集,以生成训练集和测试集.test_sizetrain_size 参数控制每次迭代的测试和训练测试集应该有多大.由于您是在每次迭代期间从整个数据集中采样,因此在一次迭代中选择的值可以在另一次迭代中再次选择.

ShuffleSplit will randomly sample your entire dataset during each iteration to generate a training set and a test set. The test_size and train_size parameters control how large the test and training test set should be for each iteration. Since you are sampling from the entire dataset during each iteration, values selected during one iteration, could be selected again during another iteration.

总结: ShuffleSplit 迭代工作,KFold 只是将数据集分成 k 折.

Summary: ShuffleSplit works iteratively, KFold just divides the dataset into k folds.

验证时的差异

在 KFold 中,在每一轮中,您将使用一个折叠作为测试集,所有剩余的折叠作为您的训练集.然而,在 ShuffleSplit 中,在每一轮 n 中,你应该使用迭代 n 的训练和测试集.随着数据集的增长,交叉验证时间会增加,从而使 shufflesplits 成为更具吸引力的替代方案.如果您可以训练您的算法,使用一定比例的数据而不是使用所有 k-1 折叠,ShuffleSplit 是一个有吸引力的选择.

In KFold, during each round you will use one fold as the test set and all the remaining folds as your training set. However, in ShuffleSplit, during each round n you should only use the training and test set from iteration n. As your data set grows, cross validation time increases, making shufflesplits a more attractive alternate. If you can train your algorithm, with a certain percentage of your data as opposed to using all k-1 folds, ShuffleSplit is an attractive option.

这篇关于KFold 和 ShuffleSplit CV 有什么区别?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-12 23:14