如何绘制多类分类器的精度和召回率?

本文介绍了如何绘制多类分类器的精度和召回率?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在使用 scikit learn，我想绘制精度和召回曲线.我使用的分类器是 RandomForestClassifier.scikit learn 文档中的所有资源都使用二进制分类.另外，我可以为多类绘制 ROC 曲线吗?

另外，我只找到了多标签的 SVM，它有一个 decision_function 而 RandomForest 没有

解决方案

来自 scikit-learn 文档:

3.ROC曲线

# roc 曲线fpr = dict()tpr = dict()对于 i 在范围内(n_classes):fpr[i], tpr[i], _ = roc_curve(y_test[:, i],y_score[:, i]))plt.plot(fpr[i], tpr[i], lw=2, label='class {}'.format(i))plt.xlabel(假阳性率")plt.ylabel(真阳性率")plt.legend(loc=最佳")plt.title(ROC 曲线")plt.show()

I'm using scikit learn, and I want to plot the precision and recall curves. the classifier I'm using is RandomForestClassifier. All the resources in the documentations of scikit learn uses binary classification. Also, can I plot a ROC curve for multiclass?

Also, I only found for SVM for multilabel and it has a decision_function which RandomForest doesn't have

解决方案

From scikit-learn documentation:

Precision-Recall:

Receiver Operating Characteristic (ROC):

Therefore, you should binarize the output and consider precision-recall and roc curves for each class. Moreover, you are going to use predict_proba to get class probabilities.

I divide the code into three parts:

general settings, learning and prediction
precision-recall curve
ROC curve

1. general settings, learning and prediction

from sklearn.datasets import fetch_mldata
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.multiclass import OneVsRestClassifier
from sklearn.metrics import precision_recall_curve, roc_curve
from sklearn.preprocessing import label_binarize

import matplotlib.pyplot as plt
#%matplotlib inline

mnist = fetch_mldata("MNIST original")
n_classes = len(set(mnist.target))

Y = label_binarize(mnist.target, classes=[*range(n_classes)])

X_train, X_test, y_train, y_test = train_test_split(mnist.data,
                                                    Y,
                                                    random_state = 42)

clf = OneVsRestClassifier(RandomForestClassifier(n_estimators=50,
                             max_depth=3,
                             random_state=0))
clf.fit(X_train, y_train)

y_score = clf.predict_proba(X_test)

2. precision-recall curve

# precision recall curve
precision = dict()
recall = dict()
for i in range(n_classes):
    precision[i], recall[i], _ = precision_recall_curve(y_test[:, i],
                                                        y_score[:, i])
    plt.plot(recall[i], precision[i], lw=2, label='class {}'.format(i))

plt.xlabel("recall")
plt.ylabel("precision")
plt.legend(loc="best")
plt.title("precision vs. recall curve")
plt.show()

3. ROC curve

# roc curve
fpr = dict()
tpr = dict()

for i in range(n_classes):
    fpr[i], tpr[i], _ = roc_curve(y_test[:, i],
                                  y_score[:, i]))
    plt.plot(fpr[i], tpr[i], lw=2, label='class {}'.format(i))

plt.xlabel("false positive rate")
plt.ylabel("true positive rate")
plt.legend(loc="best")
plt.title("ROC curve")
plt.show()

这篇关于如何绘制多类分类器的精度和召回率?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！