问题描述
请参阅假数据集。
library(data.table)
library(MASS)
n=5000
DT = data.table(
grp=1:n,
name=as.character(as.hexmode(1:n)),
x= sample(c(1:400),n,replace = TRUE)
)
setkey(DT,grp)
UIDlist <- unique(DT[,grp])
IDnamelist <- paste0("V", 1 : length(UIDlist), sep = "")
test <- DT[, (IDnamelist):=lapply(UIDlist,function(x) grp ==x)][, V5000:= NULL]
我有一个data.table, grp,Name,x,y。然后我在grp的每个级别添加dummy。
然后我需要使用glm.nb在MASS包中运行回归。
I have a data.table, in which there're 4 columns, "grp", "Name", "x", "y". And then I add dummy on each level in "grp".Then I need to run the regression using glm.nb in MASS package.
首先尝试这个
SumResult <- glm.nb(x ~ factor(uid), data = test)
但是添加虚拟变量时,当grp中有N个级别时,我们添加N-1个假人。
But when adding dummies, we must notice that when there're N levels in "grp", we add N-1 dummies. So this method is not appropriate as far as I think.
所以我试过这样:
SumResult <- glm.nb( x ~ V1 + V2 + V3 + V4 + .....+ V4999 , data = test)
很难写出所有的V1,V2,... V4999来做回归。
It's stupid to write all of the V1, V2, ... V4999 to do the regression.
有没有代码可以达到目的?
Is there code can achieve the purpose?
感谢
推荐答案
可以尝试通过字符串操作创建公式对象
You can try to create your formula object by string manipulation
formula <- as.formula(paste0("x ~ ", paste(names(test)[-(1:3)], collapse = " + ")))
sumresult <- glm.nb(formula, data = test)
您也可以使用@BrandonBertelsen
You can also use the more readable code of @BrandonBertelsen
glm.nb(x ~ ., data = test[-c(1:3)])
这篇关于在data.table中运行回归的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!