本文介绍了R:如何重复“循环"执行函数的结果?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经用R编写了一些代码.此代码获取一些数据并将其分为训练集和测试集.然后,我拟合了生存随机森林".训练集上的模型.之后,我使用该模型预测测试集中的观测结果.

I have written some code in R. This code takes some data and splits it into a training set and a test set. Then, I fit a "survival random forest" model on the training set. After, I use the model to predict observations within the test set.

由于我要处理的问题类型(生存分析"),必须为每个唯一时间"创建一个混淆矩阵.(在文件"unique.death.time"内部).对于在每个唯一时间制作的每个混淆矩阵,我对相应的敏感度"感兴趣.值(例如,敏感性_1001,敏感性_2005等).我正在尝试获取所有这些敏感度值:我想用它们作图(对比唯一的死亡时间)并确定平均敏感度值.

Due to the type of problem I am dealing with ("survival analysis"), a confusion matrix has to be made for each "unique time" (inside the file "unique.death.time"). For each confusion matrix made for each unique time, I am interested in the corresponding "sensitivity" value (e.g. sensitivity_1001, sensitivity_2005, etc.). I am trying to get all these sensitivity values : I would like to make a plot with them (vs unique death times) and determine the average sensitivity value.

为此,我需要在"unique.death.times"中重复计算每个时间点的灵敏度.我尝试手动执行此操作,这需要很长时间.

In order to do this, I need to repeatedly calculate the sensitivity for each time point in "unique.death.times". I tried doing this manually and it is taking a long time.

有人可以告诉我如何使用循环"吗?

Could someone please show me how to do this with a "loop"?

我在下面发布了我的代码:

I have posted my code below:

#load libraries
library(survival)
library(data.table)
library(pec)
library(ranger)
library(caret)

#load data
data(cost)

#split data into train and test
ind <- sample(1:nrow(cost),round(nrow(cost) * 0.7,0))
cost_train <- cost[ind,]
cost_test <- cost[-ind,]

#fit survival random forest model
ranger_fit <- ranger(Surv(time, status) ~ .,
                data = cost_train,
                mtry = 3,
                verbose = TRUE,
                write.forest=TRUE,
                num.trees= 1000,
                importance = 'permutation')

#optional: plot training results
plot(ranger_fit$unique.death.times, ranger_fit$survival[1,], type = 'l', col = 'red')    # for first observation
lines(ranger_fit$unique.death.times, ranger_fit$survival[21,], type = 'l', col = 'blue')  # for twenty first observation

#predict observations test set using the survival random forest model
ranger_preds <- predict(ranger_fit, cost_test, type = 'response')$survival
ranger_preds <- data.table(ranger_preds)
colnames(ranger_preds) <- as.character(ranger_fit$unique.death.times)

#here is my question:

#get results for some time (time >1001)
prediction <- ranger_preds$'1001' > 0.5     # time has to be in "unique.death.times."
real <- cost_test$time >= 1001

#get confusion matrix and sensitivity for this same time
confusion = confusionMatrix(as.factor(prediction), as.factor(real), positive = 'TRUE')
sensitivity_1001 = confusion$byclass[1]

#now, get the results for a second time
prediction <- ranger_preds$'2005' > 0.5     # for any time in unique.death.times.  "2005"
real <- cost_test$time >= 2005

#get confusion matirx and sensitivity for the second time
confusion = confusionMatrix(as.factor(prediction), as.factor(real), positive = 'TRUE')
sensitivity_2005 = confusion$byclass[1]

#question: how do I get the "sensitivity" for all the times in "unique.death.times", the average sensitivity and "plot sensitivity vs unique death times"?

有人可以帮我吗?

谢谢

用户"Justin Singh"提供的答案.似乎有正确的主意,但会产生以下错误:

Answer provided by user "Justin Singh". It seems to have the right idea, but the following error is produced:

sensitivity <- list()
for (time in names(ranger_preds)) {
    prediction <- ranger_preds[which(names(ranger_preds) == time)] > 0.5
    real <- cost_test$time >= as.numeric(time)
    confusion <- confusionMatrix(as.factor(prediction), as.factor(real), positive = 'TRUE')
    sensitivity[as.character(i)] <- confusion$byclass[1]
}

Error in confusionMatrix.default(as.factor(prediction), as.factor(real),  :
  The data must contain some levels that overlap the reference.

推荐答案

假设 ranger_preds 的每个列名称都采用数字形式,则您可能会遇到类似的事情:

Assuming that each column name of ranger_preds takes the form of a numeric, you could have something similar to this:

sensitivity <- list()
for (time in names(ranger_preds)) {
    prediction <- ranger_preds[which(names(ranger_preds) == time)] > 0.5
    real <- cost_test$time >= as.numeric(time)
    confusion <- confusionMatrix(as.factor(prediction), as.factor(real), positive = 'TRUE')
    sensitivity[as.character(i)] <- confusion$byclass[1]
}

我们的想法是我们创建一个敏感性列表,而不是创建多个变量,并在 names(range_preds)中为相应的 time 设置一个属性.code>(即2005年),我们可以通过调用 sensitivity $ 2005 来获得灵敏度.

The idea is we create a list for sensitivity instead of creating multiple variables, and set an attribute to the corresponding time in names(range_preds) i.e. for 2005, we'd get the sensitivity by calling sensitivity$2005.

我尚未对此进行测试,因此可能会出现错误,并且可能不是最有效的-但是,希望它会引导您朝着正确的方向前进.

I haven't tested this, so there might be errors and it might not be the most efficient - however, hopefully it will lead you in the right direction.

这篇关于R:如何重复“循环"执行函数的结果?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-21 11:03