问题描述
我正在使用 scikit learn,我想绘制精度和召回曲线.我使用的分类器是 RandomForestClassifier
.scikit learn 文档中的所有资源都使用二进制分类.另外,我可以为多类绘制 ROC 曲线吗?
另外,我只找到了多标签的 SVM,它有一个 decision_function
而 RandomForest
没有
来自 scikit-learn 文档:
3.ROC曲线
# roc 曲线fpr = dict()tpr = dict()对于 i 在范围内(n_classes):fpr[i], tpr[i], _ = roc_curve(y_test[:, i],y_score[:, i]))plt.plot(fpr[i], tpr[i], lw=2, label='class {}'.format(i))plt.xlabel(假阳性率")plt.ylabel(真阳性率")plt.legend(loc=最佳")plt.title(ROC 曲线")plt.show()
I'm using scikit learn, and I want to plot the precision and recall curves. the classifier I'm using is
RandomForestClassifier
. All the resources in the documentations of scikit learn uses binary classification. Also, can I plot a ROC curve for multiclass?Also, I only found for SVM for multilabel and it has a
decision_function
whichRandomForest
doesn't have解决方案From scikit-learn documentation:
Therefore, you should binarize the output and consider precision-recall and roc curves for each class. Moreover, you are going to use
predict_proba
to get class probabilities.I divide the code into three parts:
- general settings, learning and prediction
- precision-recall curve
- ROC curve
1. general settings, learning and prediction
from sklearn.datasets import fetch_mldata from sklearn.model_selection import train_test_split from sklearn.ensemble import RandomForestClassifier from sklearn.multiclass import OneVsRestClassifier from sklearn.metrics import precision_recall_curve, roc_curve from sklearn.preprocessing import label_binarize import matplotlib.pyplot as plt #%matplotlib inline mnist = fetch_mldata("MNIST original") n_classes = len(set(mnist.target)) Y = label_binarize(mnist.target, classes=[*range(n_classes)]) X_train, X_test, y_train, y_test = train_test_split(mnist.data, Y, random_state = 42) clf = OneVsRestClassifier(RandomForestClassifier(n_estimators=50, max_depth=3, random_state=0)) clf.fit(X_train, y_train) y_score = clf.predict_proba(X_test)
2. precision-recall curve
# precision recall curve precision = dict() recall = dict() for i in range(n_classes): precision[i], recall[i], _ = precision_recall_curve(y_test[:, i], y_score[:, i]) plt.plot(recall[i], precision[i], lw=2, label='class {}'.format(i)) plt.xlabel("recall") plt.ylabel("precision") plt.legend(loc="best") plt.title("precision vs. recall curve") plt.show()
3. ROC curve
# roc curve fpr = dict() tpr = dict() for i in range(n_classes): fpr[i], tpr[i], _ = roc_curve(y_test[:, i], y_score[:, i])) plt.plot(fpr[i], tpr[i], lw=2, label='class {}'.format(i)) plt.xlabel("false positive rate") plt.ylabel("true positive rate") plt.legend(loc="best") plt.title("ROC curve") plt.show()
这篇关于如何绘制多类分类器的精度和召回率?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!