本文介绍了使用 ROCR 和 pROC (R) 计算平均 AUC 的差异的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在处理使用 caret 包生成的 SVM-RFE 模型的交叉验证数据(10 倍重复 5 次).我知道在计算指标时 caret 包与 pROC 包一起工作,但我需要使用 ROCR 包以获得平均 ROC.但是,我注意到使用每个包时平均 AUC 值并不相同,所以我不确定是否应该模糊地使用这两个包.

I am working with cross-validation data (10-fold repeated 5 times) from a SVM-RFE model generated with the caret package. I know that caret package works with pROC package when computing metrics but I need to use ROCR package in order to obtain the average ROC. However, I noticed that the average AUC values were not the same when using each package, so I am not sure if I should use both packages indistinctively.

我用来证明的代码是:

predictions_NG3<-list()
labels_NG3<-list()

optSize <- svmRFE_NG3$optsize

resamples<-(split(svmRFE_NG3$pred,svmRFE_NG3$pred$Variables))
resamplesFOLD<-(split(resamples[[optSize]],resamples[[optSize]]$Resample))

auc_pROC <- vector()
auc_ROCR <- vector()

for (i in 1:50){
  predictions_NG3[[i]]<-resamplesFOLD[[i]]$LUNG
  labels_NG3[[i]]<-resamplesFOLD[[i]]$obs

  #WITH pROC
  rocCurve <- roc(response = labels_NG3[[i]],
                  predictor = predictions_NG3[[i]],
                  levels = c("BREAST","LUNG")) #LUNG POSITIVE

  auc_pROC <- c(auc_pROC,auc(rocCurve))

  #WITH ROCR
  pred_ROCR <- prediction(predictions_NG3[[i]], labels_NG3[[i]],
                          label.ordering = c("BREAST","LUNG")) #LUNG POSITIVE

  auc_ROCR <- c(auc_ROCR,performance(pred_ROCR,"auc")@y.values[[1]])

}

auc_mean_pROC <- mean(auc_pROC)
auc_sd_pROC <- sd(auc_pROC)
auc_mean_ROCR <- mean(auc_ROCR)
auc_sd_ROCR <- sd(auc_ROCR)

结果略有不同:

  auc_mean_pROC auc_sd_pROC auc_mean_ROCR auc_sd_ROCR
1     0.8755556   0.1524801     0.8488889   0.2072751

我注意到平均 AUC 计算在许多情况下给了我不同的结果,例如 [5][22][25]:

I noticed that the average AUC computation is giving me different results in many cases, like in [5], [22] and [25]:

> auc_pROC
 [1] 0.8333333 0.8333333 1.0000000 1.0000000 0.6666667 0.8333333 0.3333333 0.8333333 1.0000000 1.0000000 1.0000000 1.0000000
[13] 0.8333333 0.5000000 0.8888889 1.0000000 1.0000000 1.0000000 0.8333333 0.8333333 0.8333333 0.6666667 0.6666667 0.8888889
[25] 0.8333333 0.6666667 1.0000000 0.6666667 1.0000000 0.6666667 1.0000000 1.0000000 0.8333333 0.8333333 0.8333333 1.0000000
[37] 0.8333333 1.0000000 0.8333333 1.0000000 0.8333333 1.0000000 1.0000000 0.6666667 1.0000000 1.0000000 1.0000000 1.0000000
[49] 1.0000000 1.0000000
> auc_ROCR
 [1] 0.8333333 0.8333333 1.0000000 1.0000000 0.3333333 0.8333333 0.3333333 0.8333333 1.0000000 1.0000000 1.0000000 1.0000000
[13] 0.8333333 0.5000000 0.8888889 1.0000000 1.0000000 1.0000000 0.8333333 0.8333333 0.8333333 0.3333333 0.6666667 0.8888889
[25] 0.1666667 0.6666667 1.0000000 0.6666667 1.0000000 0.6666667 1.0000000 1.0000000 0.8333333 0.8333333 0.8333333 1.0000000
[37] 0.8333333 1.0000000 0.8333333 1.0000000 0.8333333 1.0000000 1.0000000 0.6666667 1.0000000 1.0000000 1.0000000 1.0000000
[49] 1.0000000 1.0000000

我尝试过其他 SVM-RFE 模型,但问题仍然存在.为什么会这样?我做错了什么吗?

I have tried with other SVM-RFE models, but the problem remains. Why is this happening? Am I doing something wrong?

推荐答案

默认情况下,pROC 中的 roc 函数会尝试检测控制和案例观察的响应级别(您覆盖了默认值)通过设置 levels 参数)以及控件的值是否应该高于或低于案例.您还没有使用 direction 参数来设置后者.

By default, the roc function in pROC attempts to detect what is the response level of control and case observations (you overrode the defaults by setting the levels argument) and whether the controls should have higher or lower values than the cases. You haven't used a direction argument to set the latter.

当您对数据重新采样时,每个样本都会进行这种自动检测.如果您的样本量很小,或者您的 AUC 接近 0.5,则可能并且会发生一些相反方向的 ROC 曲线,从而使您的平均值偏向更高的值.

When you resample your data, this auto-detection will happen for every sample. And if your sample size is low, or your AUC close to 0.5, it can and will happen that some ROC curves will be generated with the opposite direction, biasing your average towards higher values.

因此,当您重新采样 ROC 曲线或类似曲线时,您应该始终明确设置 direction 参数,例如:

Therefore you should always set the direction argument explicitly when you resample ROC curves or similar, for instance:

rocCurve <- roc(response = labels_NG3[[i]],
                predictor = predictions_NG3[[i]],
                direction = "<",
                levels = c("BREAST","LUNG"))

这篇关于使用 ROCR 和 pROC (R) 计算平均 AUC 的差异的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-13 19:22