问题描述
我正在训练ML后勤分类器,以使用python scikit-learn对两个类进行分类.它们处于极不平衡的数据中(大约14300:1).我获得了几乎100%的精度和ROC-AUC,但精度,召回率和f1得分却只有0%.我知道准确度通常在非常不平衡的数据中没有用,但是为什么ROC-AUC度量也接近完美?
I am training ML logistic classifier to classify two classes using python scikit-learn. They are in an extremely imbalanced data (about 14300:1). I'm getting almost 100% accuracy and ROC-AUC, but 0% in precision, recall, and f1 score. I understand that accuracy is usually not useful in very imbalanced data, but why is the ROC-AUC measure is close to perfect as well?
from sklearn.metrics import roc_curve, auc
# Get ROC
y_score = classifierUsed2.decision_function(X_test)
false_positive_rate, true_positive_rate, thresholds = roc_curve(y_test, y_score)
roc_auc = auc(false_positive_rate, true_positive_rate)
print 'AUC-'+'=',roc_auc
1= class1
0= class2
Class count:
0 199979
1 21
Accuracy: 0.99992
Classification report:
precision recall f1-score support
0 1.00 1.00 1.00 99993
1 0.00 0.00 0.00 7
avg / total 1.00 1.00 1.00 100000
Confusion matrix:
[[99992 1]
[ 7 0]]
AUC= 0.977116255281
上面使用logistic回归,下面使用决策树,决策矩阵看起来几乎相同,但是AUC却大不相同.
The above is using logistic regression, below is using decision tree, the decision matrix looks almost identical, but the AUC is a lot different.
1= class1
0= class2
Class count:
0 199979
1 21
Accuracy: 0.99987
Classification report:
precision recall f1-score support
0 1.00 1.00 1.00 99989
1 0.00 0.00 0.00 11
avg / total 1.00 1.00 1.00 100000
Confusion matrix:
[[99987 2]
[ 11 0]]
AUC= 0.4999899989
推荐答案
一个人必须了解AUC ROC与点式"度量标准(例如准确性/精度等)之间的关键区别.ROC是以下项的函数一个阈值.给定一个模型(分类器),该模型输出属于每个类别的概率,我们预测具有最高概率(支持)的类别.但是,有时我们可以通过更改此规则并要求一种支持比另一种支持大2倍来将其实际分类为给定类别,从而获得更高的分数. 这通常适用于不平衡的数据集.这样,您实际上就是在修改学习到的类的先验知识,以更好地适应您的数据. ROC观察如果将该阈值更改为所有可能的值会发生什么",然后AUC ROC计算该曲线的积分.
One must understand crucial difference between AUC ROC and "point-wise" metrics like accuracy/precision etc. ROC is a function of a threshold. Given a model (classifier) that outputs the probability of belonging to each class, we predict the class that has the highest probability (support). However, sometimes we can get better scores by changing this rule and requiring one support to be 2 times bigger than the other to actually classify as a given class. This is often true for imbalanced datasets. This way you are actually modifying the learned prior of classes to better fit your data. ROC looks at "what would happen if I change this threshold to all possible values" and then AUC ROC computes the integral of such a curve.
因此:
- 较高的AUC ROC与较低的f1或其他点"度量标准,这意味着您的分类器目前做得不好,但是您可以找到其分数实际上相当不错的阈值
- 低AUC ROC和低f1或其他点"指标,这意味着您的分类器目前做得不好,甚至适合阈值也不会改变
- 高AUC ROC和高f1或其他点"度量标准,意味着您的分类器目前做得不错,而对于其他许多阈值,它也会做同样的事情
- 较低的AUC ROC相对于较高的f1或其他点"度量标准,这意味着您的分类器目前做得不错,但是对于许多其他阈值而言,这很糟糕
- high AUC ROC vs low f1 or other "point" metric, means that your classifier currently does a bad job, however you can find the threshold for which its score is actually pretty decent
- low AUC ROC and low f1 or other "point" metric, means that your classifier currently does a bad job, and even fitting a threshold will not change it
- high AUC ROC and high f1 or other "point" metric, means that your classifier currently does a decent job, and for many other values of threshold it would do the same
- low AUC ROC vs high f1 or other "point" metric, means that your classifier currently does a decent job, however for many other values of threshold - it is pretty bad
这篇关于如何解释几乎完美的精度和AUC-ROC,但零f1得分,精度和召回率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!