scikit-learn cross_val_predict准确性得分如何计算？

本文介绍了scikit-learn cross_val_predict准确性得分如何计算？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

执行 cross_val_predict （请参见，v0.18）使用 k -fold方法，如下面的代码所示，计算每个折叠的精度并最终取平均值？

Does the cross_val_predict (see doc, v0.18) with k-fold method as shown in the code below calculate accuracy for each fold and average them finally or not?

cv = KFold(len(labels), n_folds=20)
clf = SVC()
ypred = cross_val_predict(clf, td, labels, cv=cv)
accuracy = accuracy_score(labels, ypred)
print accuracy

推荐答案

不，不是！

根据页面， cross_val_predict 不返回任何分数，仅返回基于此处描述的特定策略的标签：

According to cross validation doc page, cross_val_predict does not return any scores but only the labels based on a certain strategy which is described here:

因此，通过调用 accuracy_score（labels，ypred） ，您只是在计算由上述特定策略预测的标签的准确度得分，真实标签。再次在同一文档页面中指定：

And therefore by calling accuracy_score(labels, ypred) you are just calculating accuracy scores of labels predicted by aforementioned particular strategy compared to the true labels. This again is specified in the same documentation page:

predicted = cross_val_predict(clf, iris.data, iris.target, cv=10) 
metrics.accuracy_score(iris.target, predicted)

请注意，此计算结果可能与使用cross_val_score，因为元素以不同的方式分组到
中。

如果您需要不同倍数的准确性得分，应该尝试：

If you need accuracy scores of different folds you should try:

>>> scores = cross_val_score(clf, X, y, cv=cv)
>>> scores                                              
array([ 0.96...,  1.  ...,  0.96...,  0.96...,  1.        ])

，然后使用 scores.mean（）：

>>> print("Accuracy: %0.2f (+/- %0.2f)" % (scores.mean(), scores.std() * 2))
Accuracy: 0.98 (+/- 0.03)

如何计算每折的Cohen kappa系数和混淆矩阵？

用于计算 Cohen Kappa系数和混淆矩阵，我假设您是指真实标签与每个标签之间的Kappa系数和混淆矩阵fold的预测标签：

How to calculate Cohen kappa coefficient and confusion matrix for each fold?

For calculating Cohen Kappa coefficient and confusion matrix I assumed you mean kappa coefficient and confusion matrix between true labels and each fold's predicted labels:

from sklearn.model_selection import KFold
from sklearn.svm.classes import SVC
from sklearn.metrics.classification import cohen_kappa_score
from sklearn.metrics import confusion_matrix

cv = KFold(len(labels), n_folds=20)
clf = SVC()
for train_index, test_index in cv.split(X):
    clf.fit(X[train_index], labels[train_index])
    ypred = clf.predict(X[test_index])
    kappa_score = cohen_kappa_score(labels[test_index], ypred)
    confusion_matrix = confusion_matrix(labels[test_index], ypred)

<$ c是什么$ c> cross_val_predict 返回值？

它使用KFold将数据拆分为 k 部分，然后进行 i = 1..k 迭代：

What does cross_val_predict return?

It uses KFold to split the data to k parts and then for i=1..k iterations:

获取第i个部分作为测试数据，所有其他部分作为训练数据

用训练数据训练模型（除<$ c之外的所有部分$ c>第ith ）

然后通过使用经过训练的模型，预测第i 部分（测试数据）

takes i'th part as the test data and all other parts as training data
trains the model with training data (all parts except i'th)
then by using this trained model, predicts labels for i'th part (test data)

在每次迭代中，第个部分数据得到预测。最后，cross_val_predict合并所有部分预测的标签，并将它们作为最终结果返回。

In each iteration, label of i'th part of data gets predicted. In the end cross_val_predict merges all partially predicted labels and returns them as the final result.

此代码逐步显示了此过程：

This code shows this process step by step:

X = np.array([[0], [1], [2], [3], [4], [5]])
labels = np.array(['a', 'a', 'a', 'b', 'b', 'b'])

cv = KFold(len(labels), n_folds=3)
clf = SVC()
ypred_all = np.chararray((labels.shape))
i = 1
for train_index, test_index in cv.split(X):
    print("iteration", i, ":")
    print("train indices:", train_index)
    print("train data:", X[train_index])
    print("test indices:", test_index)
    print("test data:", X[test_index])
    clf.fit(X[train_index], labels[train_index])
    ypred = clf.predict(X[test_index])
    print("predicted labels for data of indices", test_index, "are:", ypred)
    ypred_all[test_index] = ypred
    print("merged predicted labels:", ypred_all)
    i = i+1
    print("=====================================")
y_cross_val_predict = cross_val_predict(clf, X, labels, cv=cv)
print("predicted labels by cross_val_predict:", y_cross_val_predict)

结果是：

iteration 1 :
train indices: [2 3 4 5]
train data: [[2] [3] [4] [5]]
test indices: [0 1]
test data: [[0] [1]]
predicted labels for data of indices [0 1] are: ['b' 'b']
merged predicted labels: ['b' 'b' '' '' '' '']
=====================================
iteration 2 :
train indices: [0 1 4 5]
train data: [[0] [1] [4] [5]]
test indices: [2 3]
test data: [[2] [3]]
predicted labels for data of indices [2 3] are: ['a' 'b']
merged predicted labels: ['b' 'b' 'a' 'b' '' '']
=====================================
iteration 3 :
train indices: [0 1 2 3]
train data: [[0] [1] [2] [3]]
test indices: [4 5]
test data: [[4] [5]]
predicted labels for data of indices [4 5] are: ['a' 'a']
merged predicted labels: ['b' 'b' 'a' 'b' 'a' 'a']
=====================================
predicted labels by cross_val_predict: ['b' 'b' 'a' 'b' 'a' 'a']

这篇关于scikit-learn cross_val_predict准确性得分如何计算？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！