本文介绍了为什么用shuffle调用KFold生成器会给出相同的索引?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用sklearn,当您创建一个新的KFold对象并且shuffle为true时,它会产生不同的,新随机的折叠索引。但是,即使混洗为真,给定KFold对象的每个生成器也会为每个折叠提供相同的索引。

With sklearn, when you create a new KFold object and shuffle is true, it'll produce a different, newly randomized fold indices. However, every generator from a given KFold object gives the same indices for each fold even when shuffle is true. Why does it work like this?

示例:

from sklearn.cross_validation import KFold
X = np.array([[1, 2], [3, 4], [1, 2], [3, 4]])
y = np.array([1, 2, 3, 4])
kf = KFold(4, n_folds=2, shuffle = True)
​
for fold in kf:
    print fold
​
print '---second round----'
​
for fold in kf:
    print fold

输出:

(array([2, 3]), array([0, 1]))
(array([0, 1]), array([2, 3]))
---second round----#same indices for the folds
(array([2, 3]), array([0, 1]))
(array([0, 1]), array([2, 3]))

此问题是由于对此。我决定将其分解为一个新问题,以防止答案变得太长。

This question was motivated by a comment on this answer. I decided to split it into a new question to prevent that answer from becoming too long.

推荐答案

具有相同KFold的新迭代对象将不会重新排列索引,而仅在对象实例化期间发生。 KFold()从未看到过数据,但知道样本数,因此使用它来洗排索引。从KFold实例化期间的代码中:

A new iteration with the same KFold object will not reshuffle the indices, that only happens during instantiation of the object. KFold() never sees the data but knows number of samples so it uses that to shuffle the indices. From the code during instantiation of KFold:

if shuffle:
    rng = check_random_state(self.random_state)
    rng.shuffle(self.idxs)

每次调用生成器以遍历索引时每折,它将使用相同的改组索引,并以相同的方式对其进行划分。

Each time a generator is called to iterate through the indices of each fold, it will use same shuffled indices and divide them the same way.

看看为KFold的基类 _PartitionIterator(with_metaclass(ABCMeta))其中定义了 __ iter __ 。基类中的 __ iter __ 方法在KFold中调用 _iter_test_indices 来划分并得出每一折的训练和测试索引。

Take a look at the code for the base class of KFold _PartitionIterator(with_metaclass(ABCMeta)) where __iter__ is defined. The __iter__ method in the base class calls _iter_test_indices in KFold to divide and yield the train and test indices for each fold.

这篇关于为什么用shuffle调用KFold生成器会给出相同的索引?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-27 00:47