问题描述
我拆分了 Train
数据集和 Test
数据集。
I split Train
data set and Test
data set.
我在R(仅火车组)中为CART(分类树)使用了包 rpart
。我想使用 ROCR
包进行ROC分析。
I used a package rpart
for CART (classification tree) in R (only train set). And I want to carry out a ROC analysis using the ROCR
package.
变量为`n。使用(响应变量... 1 =是,0 =否):
Variable is `n. use' (response varible... 1=yes, 0=no):
> Pred2 = prediction(Pred.cart, Test$n.use)
Error in prediction(Pred.cart, Test$n.use) :
**Format of predictions is invalid.**
这是我的代码。怎么了什么是正确的 type
( class
或 prob
?
This is my code. What is problem? And what is right type
("class"
or "prob"
?
library(rpart)
train.cart = rpart(n.use~., data=Train, method="class")
Pred.cart = predict(train.cart, newdata = Test, type = "class")
Pred2 = prediction(Pred.cart, Test$n.use)
roc.cart = performance(Pred2, "tpr", "fpr")
推荐答案
ROCR
中的 prediction()
函数程序包需要预测的成功概率以及观察到的失败与成功的因数。要获得前者,您需要应用 predict(...,type = prob)
到 rpart
对象(即 not class
)。 ,因为这会返回一个概率矩阵,每个响应类只有一列,因此您需要选择成功类列。
The prediction()
function from the ROCR
package expects the predicted "success" probabilities and the observed factor of failures vs. successes. In order to obtain the former you need to apply predict(..., type = "prob")
to the rpart
object (i.e., not "class"
). However, as this returns a matrix of probabilities with one column per response class you need to select the "success" class column.
不幸的是,如您的示例所示,我无法重现正在使用驼背症
dat rpart
包中的a作为示例:
As your example, unfortunately, is not reproducible I'm using the kyphosis
data from the rpart
package for illustration:
library("rpart")
data("kyphosis", package = "rpart")
rp <- rpart(Kyphosis ~ ., data = kyphosis)
然后您可以从 ROCR
prediction()函数>。在这里,我使用的是样本内(培训)数据,但是同样可以应用于样本外(测试数据):
Then you can apply the prediction()
function from ROCR
. Here, I'm using the in-sample (training) data but the same can be applied out of sample (test data):
library("ROCR")
pred <- prediction(predict(rp, type = "prob")[, 2], kyphosis$Kyphosis)
您可以可视化ROC曲线:
And you can visualize the ROC curve:
plot(performance(pred, "tpr", "fpr"))
abline(0, 1, lty = 2)
或截止值的准确性:
plot(performance(pred, "acc"))
或 ROCR
支持的任何其他地块和汇总。
Or any of the other plots and summaries supported by ROCR
.
这篇关于使用rpart包在R中的ROC曲线?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!