本文介绍了绘制多个ROC曲线的平均ROC曲线R的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含100个样本的数据集,每个样本都有195个突变,它们具有相应的已知临床意义("RealClass")和根据某种预测工具的预测值("PredictionValues")

I have a dataset of 100 samples, each of which has 195 mutations with their corresponding known clinical significance ("RealClass") and predicted value according to some prediction tool ("PredictionValues")

为演示,这是一个随机数据集,其结构与我的数据集相同:

For the demonstration, this is a random dataset that has the same structure as my dataset:

predictions_100_samples<-as.data.frame(matrix(nrow=19500,ncol=3))
colnames(predictions_100_samples)<-c("Sample","PredictionValues","RealClass")
predictions_100_samples$Sample<-rep(c(1:100), each = 195)
predictions_100_samples$PredictionValues<-sample(seq(0,1,length.out=19500))
predictions_100_samples$RealClass<-rep(c("pathogenic","benign"),each=10)
colours_for_ROC_curves<-rainbow(n=100)

我通过PROC软件包将这100个样本全部绘制为ROC曲线:

I plotted all of those 100 sample as ROC curves via PROC package:

library("pROC")
roc_both <- plot(roc(predictor=predictions_100_samples[1:195,2],response = predictions_100_samples[1:195,3]), col = colours_for_ROC_curves[1],main="100 samples ROC curves",legacy.axes=TRUE,lwd=1)
i=2
for(i in 1:100){
    set.seed(500)
    roc_both <- plot(roc(predictor=predictions_100_samples[(((i-1)*195)+1):(i*195),2],response = predictions_100_samples[(((i-1)*195)+1):(i*195),3]), col = colours_for_ROC_curves[i], add = TRUE,lwd=1)
                     i=i+1
}

这就是最终情节的样子:

And that is how the final plot looks like:

现在,我想将所有100条绘制的ROC曲线的平均ROC曲线添加到同一图中.我尝试通过我编写的循环中的"roc"函数使用针对每个阈值计算的敏感性和特异性(可以通过roc_both$sensitivitiesroc_both$specificitiesroc_both$thresholds实现)

Now, I want to add the mean ROC curve of all 100 plotted ROC curves to the same plot.I tried to use the sensitivities and specificities calculated for each threshold via "roc" function along the loop I wrote (It can be achived by roc_both$sensitivities, roc_both$specificities, roc_both$thresholds)

但是主要问题是,沿着我绘制的100条ROC曲线,选择的阈值是随机的并且不相等,因此我无法手动计算平均ROC曲线.

But the main problem was that the chosen thresholds were random and not equal along the 100 ROC curves I plotted, so I could'nt calculate the mean ROC curve manually.

是否有其他软件包可以让我生成多个ROC曲线的平均ROC曲线?还是有一个软件包可以设置手动计算灵敏度和特异性的阈值,以便以后可以计算平均ROC曲线?您可能对我的问题有不同的解决方案吗?

Is there a different package that may allow me to produce the mean ROC curves of multiple ROC curves? Or is there a package that allows setting the thresholds for calculating sensitivity and specificity manually, so I could later on be able to calculate the mean ROC curve?Do you maybe have a different solution for my problem?

谢谢!

推荐答案

您可以使用cutpointr通过oc_manual函数手动指定阈值.我对数据生成进行了一些更改,以使ROC曲线看起来更好一些.

You can use cutpointr for specifying the thresholds manually via the oc_manual function. I altered the data generation a bit so that the ROC curve looks a little nicer.

我们对所有样品应用相同的阈值序列,并取每个阈值的灵敏度和特异性平均值,以得出平均ROC曲线".

We apply the same sequence of thresholds to all samples and take the mean of the sensitivity and specificity per threshold to get the "mean ROC curve".

predictions_100_samples <- data.frame(
    Sample = rep(c(1:100), times = 195),
    PredictionValues = c(rnorm(n = 9750), rnorm(n = 9750, mean = 1)),
    RealClass = c(rep("benign", times = 9750), rep("pathogenic", times = 9750))
)

library(cutpointr)
library(tidyverse)
mean_roc <- function(data, cutoffs = seq(from = -5, to = 5, by = 0.5)) {
    map_df(cutoffs, function(cp) {
        out <- cutpointr(data = data, x = PredictionValues, class = RealClass,
                         subgroup = Sample, method = oc_manual, cutpoint = cp,
                         pos_class = "pathogenic", direction = ">=")
        data.frame(cutoff = cp,
                   sensitivity = mean(out$sensitivity),
                   specificity = mean(out$specificity))
    })
}

mr <- mean_roc(predictions_100_samples)
ggplot(mr, aes(x = 1 - specificity, y = sensitivity)) +
    geom_step() + geom_point() +
    theme(aspect.ratio = 1)

您可以通过cutpointr绘制单独的ROC曲线和添加的平均ROC曲线:

You can plot the separate ROC curves and the added mean ROC curve with cutpointr this way:

cutpointr(data = predictions_100_samples,
          x = PredictionValues, class = RealClass, subgroup = Sample,
          pos_class = "pathogenic", direction = ">=") %>%
    plot_roc(display_cutpoint = F) + theme(legend.position="none") +
    geom_line(data = mr, mapping = aes(x = 1 - specificity, y = sensitivity),
              color = "black")

或者,您可能希望研究汇总ROC曲线(SROC)的理论,以拟合结合了多个ROC曲线的参数模型.

Alternatively, you may want to look into the theory on summary ROC curves (SROC) for fitting a parametric model that combines multiple ROC curves.

这篇关于绘制多个ROC曲线的平均ROC曲线R的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

05-23 03:33