PR曲线 ROC曲线的计算及绘制

　　　　在linear model中，我们对各个特征线性组合，得到linear score,然后确定一个threshold，linear score ＜ threshold 判为负类，linear score ＞ threshold 判为正类。画PR曲线时，我们可以想象threshold 是不断变化的。首先，threshold 特别大，这样木有一个是正类，我们计算出查全率与查准率；然后 threshold 减小，只有一个正类，我们计算出查全率与查准率；然后 threshold再减小，有2个正类，我们计算出查全率与查准率；threshold减小一次，多出一个正类，直到所有的类别都被判为正类。然后以查全率为横坐标，差准率为纵坐标，画出图形即可。

例如，有

实际类别	linear score	threshold 为6	threshold 为5	threshold 为4	threshold 为3	threshold 为2	threshold 为1
+	5.2	-	+	+	+	+	+
+	4.45	-	-	+	+	+	+
-	3.5	-	-	-	+	+	+
-	2.45	-	-	-	-	+	+
-	1.65	-	-	-	-	-	+
		0/0	1 / 1	2 / 2	2 / 3	2 / 4	2 / 5	查准率
		0/2	1 / 2	2 / 2	2/ 2	2 / 2	2/ 2	差全率
		0/2	1/2	2/2	2/2	2/2	2/2	TPR
			0/3	0/3	1/3	2/3	3/3	FPR

PR曲线 ROC曲线的计算及绘制-LMLPHP

行是实际的类，列是分类器得到的类别。常用的术语如下：

真阳性(TP)——正确的肯定
真阴性(TN)——正确的否定
假阳性(FP)——错误的肯定，假报警，第一类错误
假阴性(FN)——错误的否定，未命中，第二类错误

查全率：预测为正的里面，实际为正的比例。

查准率：预测为正，实际为正占的比例。

真正例率(TPR) = 查全率
TPR = TP / P = TP / (TP+FN)

假正例率(FPR)
FPR = FP / N = FP / (FP + TN)

 import matplotlib

 import numpy as np

 import matplotlib.pyplot as plt

 Recall = np.array([0,1/2,2/2,2/2,2/2,2/2])

 Precison = np.array([1/1,2/2,2/3,2/4,2/5,0])

Precison = np.array([0,1/1,2/2,2/3,2/4,2/5])

 plt.figure()

 plt.ylim(0,1.1)

 plt.xlabel("Recall")

 plt.xlim(0,1.1)

 plt.ylabel("Precison")

 plt.plot(Recall,Precison)

 plt.show()

PR曲线 ROC曲线的计算及绘制-LMLPHP

ROC与PR类似，只是横坐标与纵坐标换成成了FPR与TPR，这样FPR与TPR计算时，分母不变，画图更加方便。

绘图过程：给定m1 个正例，m2 个负例. linear score 排序。

在坐标（0,0）标一个点，然后改变阈值，多出一个预测正例，

设当前的坐标为（x,y)，当前若为真正例，则对应坐标点的坐标为（x,y+1/m1)，当前若为假正例，则对应坐标点的坐标为（x＋１／ｍ２，ｙ）

 import matplotlib

 import numpy as np

 import matplotlib.pyplot as plt

 FPR = np.array([0/3,0/3,0/3,1/3,2/3,3/3])

 TPR = np.array([0/2,1/2,2/2,2/2,2/2,2/2])

 plt.figure()

 plt.ylim(-0.1,1.5)

 plt.xlabel("FPR")

 plt.xlim(-0.1,1.5)

 plt.ylabel("TPR")

 plt.plot(FPR,TPR)

 plt.show()

PR曲线 ROC曲线的计算及绘制-LMLPHP

linear

PR曲线 ROC曲线的 计算及绘制

PR曲线 ROC曲线的计算及绘制