本文介绍了如何使用从CSV文件转换的数据框绘制ROC曲线的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图使用sklearn提供的文档来绘制ROC曲线.我的数据在CSV文件中,看起来像这样.它有两个类:好"和差"

I was trying to plot a ROC curve by using the documentation provided by sklearn. My data is in a CSV file, and it looks like this.It has two classes 'Good'and 'Bad'

我的CSV文件的屏幕截图

screenshot of my CSV file

我的代码如下

import numpy as np
import matplotlib.pyplot as plt
from itertools import cycle
import sys
from sklearn import svm, datasets
from sklearn.metrics import roc_curve, auc
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import label_binarize
from sklearn.multiclass import OneVsRestClassifier
from scipy import interp
from sklearn.neighbors import KNeighborsClassifier
from sklearn.naive_bayes import MultinomialNB

# Import some data to play with
df = pd.read_csv("E:\\autodesk\\TTI ROC curve.csv")
X =df[['TTI','Max TemperatureF','Mean TemperatureF','Min TemperatureF',' Min Humidity']].values
y = df['TTI_Category'].as_matrix()
# Binarize the output
y = label_binarize(y, classes=['Good','Bad'])
n_classes = y.shape[1]

# shuffle and split training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.5,
                                                    random_state=0)

# Learn to predict each class against the other
classifier = OneVsRestClassifier(svm.SVC(kernel='linear', probability=True,
                                 random_state=random_state))
y_score = classifier.fit(X_train, y_train).decision_function(X_test)

# Compute ROC curve and ROC area for each class
fpr = dict()
tpr = dict()
roc_auc = dict()
for i in range(n_classes):
    fpr[i], tpr[i], _ = roc_curve(y_test[:, i], y_score[:, i])
    roc_auc[i] = auc(fpr[i], tpr[i])

# Compute micro-average ROC curve and ROC area
fpr["micro"], tpr["micro"], _ = roc_curve(y_test.ravel(), y_score.ravel())
roc_auc["micro"] = auc(fpr["micro"], tpr["micro"])
plt.figure()
lw = 2
plt.plot(fpr[2], tpr[2], color='darkorange',
         lw=lw, label='ROC curve (area = %0.2f)' % roc_auc[2])
plt.plot([0, 1], [0, 1], color='navy', lw=lw, linestyle='--')
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver operating characteristic example')
plt.legend(loc="lower right")
plt.show()enter code here

如果我运行此代码,则系统告诉我random_state未定义.所以我将其更改为random_state = true.然后系统告诉我

If i run this code the system told me random_state is not defined. so I changed it to random_state=true. Then the system told me

plt.plot(fpr[2], tpr[2], color='darkorange', KeyError: 2 <matplotlib.figure.Figure at 0xd8bff60>

如果我打印出n_classes.系统告诉我它是"1",如果我在文档中打印出n_classes,它会显示3.我不确定这是否是问题所在.有人对此追溯有答案吗?

if I print out n_classes. The system told me it's "1", and if I print out the n_classes in the documentation it says 3. I'm not sure if that's where the problem is. Does anyone have answer to this traceback?

推荐答案

看起来您根本不了解数据的结构和代码的工作方式.

Looks like you simply don't understand how your data is structured and how your code should work.

LabelBinarizer将返回一个全是"的编码,这意味着对于两个类,您将获得以下映射:['good', 'bad', 'good'] -> [[1], [0], [1]],s.t. n_classes = 1.

LabelBinarizer will return a one-v-all encoding, meaning that for two classes you will get the following mapping: ['good', 'bad', 'good'] -> [[1], [0], [1]], s.t. n_classes = 1.

如果您有2个班级,为什么会期望它是3个?只需将plt.plot(fpr[2], tpr[2], color='darkorange', lw=lw, label='ROC curve (area = %0.2f)' % roc_auc[2])更改为plt.plot(fpr[0], tpr[0], color='darkorange', lw=lw, label='ROC curve (area = %0.2f)' % roc_auc[0]),就可以了.

Why would you expect it to be 3 if you have 2 classes?Simply change plt.plot(fpr[2], tpr[2], color='darkorange', lw=lw, label='ROC curve (area = %0.2f)' % roc_auc[2]) to plt.plot(fpr[0], tpr[0], color='darkorange', lw=lw, label='ROC curve (area = %0.2f)' % roc_auc[0]) and you should be good.

这篇关于如何使用从CSV文件转换的数据框绘制ROC曲线的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-31 05:44