flase阳性率=假阳性/(假阳性+真阴性)误报数:0真阴性数:1因此,假阳性率= 0/0 + 1 = 0读取roc_curve( > http://scikit-learn.org/stable/modules/generation/sklearn.metrics.roc_curve.html#sklearn.metrics.roc_curve ): fpr:数组,形状= [> 2] 增加误判率,使元素i为假 得分> =阈值[i]的预测的阳性率. tpr:数组,形状= [> 2] 增加真实正利率,使元素i为真实 得分> =阈值[i]的预测的阳性率. 阈值:数组,形状= [n_thresholds] 用于计算fpr和 tpr. thresholds [0]表示未预测任何实例,并且为 任意设置为max(y_score)+1.这与我手动计算误报率有何不同?阈值如何设置?此处提供了有关阈值的一些模式信息: https://datascience.stackexchange.com/questions/806/advantages-of-auc-vs-standard-accuracy ,但是我对它与该实现方式的配合感到困惑吗?解决方案首先,维基百科正在考虑生病= 1. 是肯定的:正确识别出有病的生病的人第二,每个模型都有基于正分类概率的阈值(通常为0.5).因此,如果阈值是0.1,则所有概率大于0.1的样本都将被归类为正.预测样本的概率是固定的,阈值将有所变化.在roc_curve中,scikit-learn从以下位置增加阈值: 0 (or minimum value where all the predictions are positive)到1 (Or the last point where all predictions become negative).根据预测从正值到负值的变化来确定中间点.示例:Sample 1 0.2Sample 2 0.3Sample 3 0.6Sample 4 0.7Sample 5 0.8此处的最低概率为0.2,因此任何有意义的最小阈值为0.2.现在,随着我们不断提高阈值,由于本示例中的点数很少,因此阈值点将在每种概率下发生变化(并且等于该概率,因为那是正负数发生变化的点) Negative Positive <0.2 0 5Threshold1 >=0.2 1 4Threshold2 >=0.3 2 3Threshold3 >=0.6 3 2Threshold4 >=0.7 4 1Threshold5 >=0.8 5 0import matplotlib.pyplot as pltfrom sklearn.metrics import roc_curve, auc , roc_auc_scoreimport numpy as npcorrect_classification = np.array([0,1])predicted_classification = np.array([1,1])false_positive_rate, true_positive_rate, tresholds = roc_curve(correct_classification, predicted_classification)print(false_positive_rate)print(true_positive_rate)From https://en.wikipedia.org/wiki/Sensitivity_and_specificity : True positive: Sick people correctly identified as sickFalse positive: Healthy people incorrectly identified as sickTrue negative: Healthy people correctly identified as healthyFalse negative: Sick people incorrectly identified as healthyI'm using these values 0 : sick, 1 : healthyFrom https://en.wikipedia.org/wiki/False_positive_rate :flase positive rate = false positive / (false positive + true negative)number of false positive : 0number of true negative : 1therefore false positive rate = 0 / 0 + 1 = 0Reading the return value for roc_curve (http://scikit-learn.org/stable/modules/generated/sklearn.metrics.roc_curve.html#sklearn.metrics.roc_curve) : fpr : array, shape = [>2] Increasing false positive rates such that element i is the false positive rate of predictions with score >= thresholds[i]. tpr : array, shape = [>2] Increasing true positive rates such that element i is the true positive rate of predictions with score >= thresholds[i]. thresholds : array, shape = [n_thresholds] Decreasing thresholds on the decision function used to compute fpr and tpr. thresholds[0] represents no instances being predicted and is arbitrarily set to max(y_score) + 1.How is this a differing value to my manual calculation of false positive rate ? How are thresholds set ? Some mode information on thresholds is provided here : https://datascience.stackexchange.com/questions/806/advantages-of-auc-vs-standard-accuracy but I'm confused as to how it fits with this implementation ? 解决方案 First, the wikipedia is considering sick=1. True positive: Sick people correctly identified as sickSecond, every model has some threshold based on probabilities of positive class (generally 0.5).So if threshold is 0.1, all samples having probabilities greater than 0.1 will be classified as positive. The probabilities of the predicted samples are fixed and thresholds will be varied.In the roc_curve, scikit-learn increases the threshold value from: 0 (or minimum value where all the predictions are positive)to 1 (Or the last point where all predictions become negative).Intermediate points are decided based on changes of predictions from positive to negative.Example:Sample 1 0.2Sample 2 0.3Sample 3 0.6Sample 4 0.7Sample 5 0.8The lowest probability here is 0.2, so the minimum threshold to make any sense is 0.2. Now as we keep increasing the threshold, since there are very less points in this example, threshold points will be changed at each probability (and is equal to that probability, because thats the point where number of positives and negatives change) Negative Positive <0.2 0 5Threshold1 >=0.2 1 4Threshold2 >=0.3 2 3Threshold3 >=0.6 3 2Threshold4 >=0.7 4 1Threshold5 >=0.8 5 0 这篇关于了解ROC曲线的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持! 上岸,阿里云!
08-13 19:21