问题描述
我正在尝试对约40万个预测变量进行多个logistic回归分析.我想将每次运行的输出捕获到输出表的行/列中.
I'm trying to run multiple logistic regression analyses for each of ~400k predictor variables. I would like to capture the outputs of each run into a row/column of an output table.
我的数据分为两个部分.我有一个400000 x 189双矩阵(mydatamatrix
),其中包含在189个个体(P1
)中测得的我的400000个预测变量的观察值/数据.我还有第二个189 x 20数据帧(mydataframe
),其中包含结果变量和另一个预测变量(O1
和P2
)以及此特定分析中未使用的其他18个变量.
My data organised in two parts. I have a 400000 x 189 double matrix (mydatamatrix
) that contains the observations/data for each of my 400000 predictor variables measured in 189 individuals (P1
). I also have a second 189 x 20 data frame (mydataframe
) containing the outcome variable and another predictor variable (O1
and P2
) plus 18 other variables not used in this particular analysis.
我的回归模型是O1~ P1+P2
,其中O1
是二进制.
My regression model is O1~ P1+P2
, where O1
is binary.
我有以下工作循环:
为结果创建输出文件
output<-data.frame(matrix(nrow=400000, ncol=4))
names(output)=c("Estimate", " Std. Error", " z value", " Pr(>|z|)")
运行i
预测变量的逻辑回归循环并将输出存储在输出文件中
run logistic regression loop for i
predictors and store output in output file
for (i in c(1:400000)){
result<-(glm(mydataframe$O1 ~ mydatamatrix[,i] + as.factor(mydataframe$P2),
family=binomial))
row.names(output)<-row.names(mydatamatrix)
output[i,1]<-coef(summary(result))[2,1]
output[i,2]<-coef(summary(result))[2,2]
output[i,3]<-coef(summary(result))[2,3]
output[i,4]<-coef(summary(result))[2,4]
}
但是,运行时间非常长(输出前20k个测试花了一个多小时).有没有更有效的方法来进行此分析?
However, the run time is huge (it took over an hour to output the first 20k tests). Is there a more efficient way to run this analysis?
推荐答案
如果使用apply
而不是for
循环,它将更快:
It will be faster if you use apply
instead of a for
loop:
t(apply(mydatamatrix, 2,
function(x)
coef(summary(glm(mydataframe$O1 ~ x + as.factor(mydataframe$P2),
family=binomial)))[2, 1:4]))
这篇关于R中的有效循环逻辑回归的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!