我有一个尺寸为5000 x 3027的矩阵形式的训练数据集(CIFAR-10数据集)。在numpy中使用array_split,我将其划分为5个不同的部分,我只选择其中一部分作为交叉验证折叠。但是,当我使用类似的东西时,我的问题来了
XTrain [[Indexes]],其中index是一个像[0,1,2,3]的数组,因为这样做给了我3D张量,尺寸为4 x 1000 x 3027,而不是矩阵。如何将“4 x 1000”折叠成4000行,以获得4000 x 3027的矩阵?
for fold in range(len(X_train_folds)):
indexes = np.delete(np.arange(len(X_train_folds)), fold)
XTrain = X_train_folds[indexes]
X_cv = X_train_folds[fold]
yTrain = y_train_folds[indexes]
y_cv = y_train_folds[fold]
classifier.train(XTrain, yTrain)
dists = classifier.compute_distances_no_loops(X_cv)
y_test_pred = classifier.predict_labels(dists, k)
num_correct = np.sum(y_test_pred == y_test)
accuracy = float(num_correct/num_test)
k_to_accuracy[k] = accuracy
最佳答案
我建议使用scikit-learn包。它已经带有许多常见的机器学习工具,例如K-fold cross-validation generator:
>>> from sklearn.cross_validation import KFold
>>> X = # your data [samples x features]
>>> y = # gt labels
>>> kf = KFold(X.shape[0], n_folds=5)
然后,遍历
kf
:>>> for train_index, test_index in kf:
X_train, X_test = X[train_index], X[test_index]
y_train, y_test = y[train_index], y[test_index]
# do something
上面的循环将执行
n_folds
次,每次使用不同的训练和测试索引。