python - sklearn中的留一法交叉验证的ROC曲线

我想使用留一法交叉验证来绘制分类器的ROC曲线。

似乎已经问过类似的问题，但没有任何答案。

在另一个问题中，陈述了here：

  为了使用LeaveOneOut获得有意义的ROC AUC，您需要
  计算每个折的概率估计（每个折仅由
  一个观察值），然后根据所有这些计算出ROC AUC
  概率估计。

此外，在scikit-learn官方网站上有一个类似的示例，但使用KFold交叉验证（here）。

因此，对于遗忘式交叉验证案例，我正在考虑收集测试集上的所有概率预测（一次是一个样本），并在获得所有折叠的预测概率之后，计算并绘制ROC曲线。

这看起来还好吗？我看不到其他实现目标的方法。

这是我的代码：

from sklearn.svm import SVC
import numpy as np, matplotlib.pyplot as plt,  pandas as pd
from sklearn.model_selection import cross_val_score,cross_val_predict,  KFold,  LeaveOneOut, StratifiedKFold
from sklearn.metrics import roc_curve, auc
from sklearn import datasets

# Import some data to play with
iris = datasets.load_iris()
X_svc = iris.data
y = iris.target
X_svc, y = X_svc[y != 2], y[y != 2]

clf = SVC(kernel='linear', class_weight='balanced', probability=True, random_state=0)
kf = LeaveOneOut()

all_y = []
all_probs=[]
for train, test in kf.split(X_svc, y):
    all_y.append(y[test])
    all_probs.append(clf.fit(X_svc[train], y[train]).predict_proba(X_svc[test])[:,1])
all_y = np.array(all_y)
all_probs = np.array(all_probs)

fpr, tpr, thresholds = roc_curve(all_y,all_probs)
roc_auc = auc(fpr, tpr)
plt.figure(1, figsize=(12,6))
plt.plot(fpr, tpr, lw=2, alpha=0.5, label='LOOCV ROC (AUC = %0.2f)' % (roc_auc))
plt.plot([0, 1], [0, 1], linestyle='--', lw=2, color='k', label='Chance level', alpha=.8)
plt.xlim([-0.05, 1.05])
plt.ylim([-0.05, 1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver operating characteristic example')
plt.legend(loc="lower right")
plt.grid()
plt.show()

https://scikit-learn.org/stable/auto_examples/model_selection/plot_roc_crossval.html#sphx-glr-auto-examples-model-selection-plot-roc-crossval-py

最佳答案

我相信代码是正确的，拆分也是如此。我为实现和结果的验证添加了几行：

from sklearn.model_selection import cross_val_score,cross_val_predict,  KFold,  LeaveOneOut, StratifiedKFold
from sklearn.metrics import roc_curve, auc
from sklearn import datasets

# Import some data to play with
iris = datasets.load_iris()
X_svc = iris.data
y = iris.target
X_svc, y = X_svc[y != 2], y[y != 2]

clf = SVC(kernel='linear', class_weight='balanced', probability=True, random_state=0)
kf = LeaveOneOut()
if kf.get_n_splits(X_svc) == len(X_svc):
    print("They are the same length, splitting correct")
else:
    print("Something is wrong")
all_y = []
all_probs=[]
for train, test in kf.split(X_svc, y):
    all_y.append(y[test])
    all_probs.append(clf.fit(X_svc[train], y[train]).predict_proba(X_svc[test])[:,1])
all_y = np.array(all_y)
all_probs = np.array(all_probs)
#print(all_y) #For validation
#print(all_probs) #For validation

fpr, tpr, thresholds = roc_curve(all_y,all_probs)
print(fpr, tpr, thresholds) #For validation
roc_auc = auc(fpr, tpr)
plt.figure(1, figsize=(12,6))
plt.plot(fpr, tpr, lw=2, alpha=0.5, label='LOOCV ROC (AUC = %0.2f)' % (roc_auc))
plt.plot([0, 1], [0, 1], linestyle='--', lw=2, color='k', label='Chance level', alpha=.8)
plt.xlim([-0.05, 1.05])
plt.ylim([-0.05, 1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver operating characteristic example')
plt.legend(loc="lower right")
plt.grid()
plt.show()

If行旨在仅确保将拆分进行了n次，其中n是给定数据集的观察次数。这是因为如文档所述，LeaveOneOut的作用与Kfold(n_splits=n) and LeaveOneOut(p=1)相同。
同样，在打印预测的Proba值时，它们很好，可以理解曲线。恭喜您的1.00AUC！

关于python - sklearn中的留一法交叉验证的ROC曲线，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/57756804/