问题描述
我正在尝试使用 glmmLasso 在广义线性混合模型中执行变量选择,但出现错误和警告,我无法解决.数据集不平衡,一些参与者 (PTNO) 的样本比其他参与者多;没有丢失的数据.我的因变量是二进制的,所有其他变量(除了 ID 变量 PTNO)都是连续的.我怀疑正在发生一些非常普遍的事情,但显然没有看到它,也没有在文档或网络上找到任何解决方案.代码基本上只是改编自 glmmLasso 足球示例:
I am trying to perform variable selection in a generalized linear mixed model using glmmLasso, but am coming up with an error and a warning, that I can not resolve. The dataset is unbalanced, with some participants (PTNO) having more samples than others; no missing data. My dependent variable is binary, all other variables (beside the ID variable PTNO) are continous.I suspect something very generic is happening, but obviously fail to see it and have not found any solution in the documentation or on the web.The code, which is basically just adapted from the glmmLasso soccer example is:
glm8< - glmmLasso(集团〜NDUFV2_dCTABL + GPER1_dCTABL + ESR1_dCTABL + ESR2_dCTABL + KLF12_dCTABL + SP4_dCTABL + SP1_dCTABL + PGAM1_dCTABL + ANK3_dCTABL + RASGRP1_dCTABL + AKT1_dCTABL + NUDT1_dCTABL + POLG_dCTABL + ADARB1_dCTABL + OGG_dCTABL + PDE4B_dCTABL + GSK3B_dCTABL + APOE_dCTABL + MAPK6_dCTABL,RND =列表(PTNO =~1),家庭 = 泊松(链接 = 日志),数据 = 堆栈数据,拉姆达 = 100,control = list(print.iter=TRUE,start=c(1,rep(0,29)),q.start=0.7))
错误消息显示如下.具体来说,我不相信数据集中有任何 NA,我不确定关于因子变量的警告的含义.
The error message is displayed below. Specficially, I do not believe there are any NAs in the dataset and I am unsure about the meaning of the warning regarding the factor variable.
迭代 1grad.lasso[b.is.0] <- score.beta[b.is.0] - lambda.b * sign(score.beta[b.is.0]) 中的错误:下标赋值中不允许使用 NA另外: 警告信息:在 Ops.factor(y, Mu) 中:‘-’对因子没有意义
包含必要变量的缩写数据集以 R 格式提供并可下载此处.我希望我能在如何继续分析方面得到一些指导.如果数据集有问题或无法下载,请告诉我.非常感谢任何帮助.
An abbreviated dataset containing the necessary variables is available in R format and can be downladed here.I hope I can be guided a bit as to how to go on with the analysis. Please let me know if there is anything wrong with the dataset or you cannot download it. ANY help is much appreciated.
推荐答案
只是为了跟进上面的@Kristofersen 评论.确实是 start
向量使您的分析变得混乱.
Just to follow up on @Kristofersen comment above. It is indeed the start
vector that messes your analysis up.
如果我跑
glm8< - glmmLasso(集团〜NDUFV2_dCTABL + GPER1_dCTABL + ESR1_dCTABL + ESR2_dCTABL + KLF12_dCTABL + SP4_dCTABL + SP1_dCTABL + PGAM1_dCTABL + ANK3_dCTABL + RASGRP1_dCTABL + AKT1_dCTABL + NUDT1_dCTABL + POLG_dCTABL + ADARB1_dCTABL + OGG_dCTABL + PDE4B_dCTABL + GSK3B_dCTABL + APOE_dCTABL + MAPK6_dCTABL,rnd = 列表(PTNO=~1),家庭 = 二项式(),数据 = 堆栈数据,λ=100,control = list(print.iter=TRUE))
然后一切都很好(即,它收敛并产生解决方案).您已经使用泊松回归复制了示例,并且需要根据您的情况调整代码.我不知道输出是否有意义.
then everything is fine and dandy (i.e., it converges and produces a solution). You have copied the example with poisson regression and you need to tweak the code to your situation. I have no idea about whether the output makes sense.
快速说明:我在上面的代码中使用二项式分布,因为你的结果是二元的.如果估计相对风险有意义,那么泊松可能是合理的(并且它也会收敛),但是您需要重新编码您的结果,因为这两个组被定义为 1
和 2
这肯定会搞乱泊松回归.
Quick note: I ran with the binomial distribution in the code above since your outcome is binary. If it makes sense to estimate relative risks then poisson may be reasonable (and it also converges), but you need to recode your outcome as the two groups are defined as 1
and 2
and that will certainly mess up the poisson regression.
换句话说,做一个
stackdata$Group <- stackdata$Group-1
在运行分析之前.
这篇关于glmmLasso 错误和警告的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!