通过`：`循环中的`：=`赋值（R data.table）

本文介绍了通过`：`循环中的`：=`赋值（R data.table）的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述我试图在中为循环分配一些新的变量（我试图创建一些具有共同结构的变量，但它们是依赖于子抽样的）。我试着在我的生活中重新产生这个错误样本数据，我不能。这里的代码工作&获得我想要做的主旨： dt grp = rep（sample（4，size = 100，replace = T），each = 20），y = runif（2000，min = 0，max = 5），key = c（id，period））[，x：= cumsum（y），by = id] dt2 DT3<在％SEQ（1,100 -dt [ID％，按= 3）] 为（列表中的DD（DT，DT2，DT3 ））{ setkey的（setkey的（DD，GRP）[DD [时间段== 0，和（x）时，由= GRP]，x_at_0_by_grp：= V]，编号，周期）} 这很好 - 但是，当我对自己的代码执行此操作时，它会生成无效的。 selfref warning（并且不创建我想要的变量）：事实上，当我将我的数据子集到仅在合并中需要的那些列，它也适用于我的数据（虽然不保存到原始数据集）。这表明这是一个键控问题，但我明确设置键的每一步。我完全失去了如何调试这里从这里，因为我不能得到错误重复除了我的完整的数据集。如果我突破操作（dt，dt2，dt3）中的错误）$ {$（$，$）$ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ dummy setkey（dd，grp） dd [dummy，x_at_0_by_grp：= V1]＃*** ERROR HERE *** setkey（dd，id，period）} 快速更新 - 如果我使用 lapply 而不是 / code> loop。任何想法都在这里发生了什么？ UPDATE：我想出了一个解决方法，通过做： nnames< -c（dt，dt2 ，dt3） dt_list< -list（dt，dt2，dt3） for（ii in 1：3）{ dummy& dt_list [[ii]]） dummy [，x_at_0_by_grp：= sum（x [period == 0]），by = grp] assign（nnames [ii]，dummy）} 还是想了解发生了什么，或许是一种更好的方法使用20-30条件，将它们保留在列表之外（手动名称为 dt2 等）太笨重，所以我只是假设你有他们所有在 dt_list 。我建议只使用您计算的统计资料建立表格，然后 rbind > xxt dt_list [[i] [，list（cond = i，xx = sum（x [period == 0]）），by = grp]））创建 grp cond xx 1：1 1 623.3448 2 ：2 1 784.8438 3：4 1 699.2362 4：3 1 367.7196 5：1 2 323.6268 6：4 2 307.0374 7：2 2 447.0753 8：3 2 185.7377 9：1 3 275.4897 10：4 3 243.0214 11：2 3 149.6041 12：3 3 166.3626 如果你真的想要这些var，你可以很容易地合并回来。例如， dt2 ： myi = 2 setkey（dt_list [[myi]]，grp）[xxt [cond == myi，list（grp，xx）]] b $ b 这不能解决你遇到的错误，但我认为是一个更好的方法。 I'm trying to assign some new variables within a for loop (I'm trying to create some variables with common structure, but which are subsample-dependent).I've tried for the life of me to re-produce this error on sample data and I can't. Here's code that works & gets the gist of what I want to do:dt<-data.table(id=rep(1:100,each=20),period=rep(-9:10,100), grp=rep(sample(4,size=100,replace=T),each=20), y=runif(2000,min=0,max=5),key=c("id","period"))[,x:=cumsum(y),by=id]dt2<-dt[id %in% seq(1,100,by=2),]dt3<-dt[id %in% seq(1,100,by=3),]for (dd in list(dt,dt2,dt3)){ setkey(setkey(dd,grp)[dd[period==0,sum(x),by=grp],x_at_0_by_grp:=V1],id,period)}This works fine--however, when I do this to my own code, it generates the Invalid .internal.selfref warning (and doesn't create the variable I want):In fact, when I subset my data to only those columns needed within the merge, it also works fine on my data (though doesn't save to the original data sets).This suggests to me it's a problem with keying, but I'm explicitly setting the keys every step of the way. I'm completely lost on how to debug this from here because I can't get the error to repeat except on my full data set.If I break out the operation into steps, the error arises at the merge step:for (dd in list(dt,dt2,dt3)){ dummy<-dd[period==0,sum(x),by=grp] setkey(dd,grp) dd[dummy,x_at_0_by_grp:=V1] #***ERROR HERE*** setkey(dd,id,period)}Quick update--also produces the error if I cast this with lapply instead of within a for loop. Any ideas what on earth is going on here?UPDATE: I've come up with a workaround by doing:nnames<-c("dt","dt2","dt3")dt_list<-list(dt,dt2,dt3)for (ii in 1:3){ dummy<-copy(dt_list[[ii]]) dummy[,x_at_0_by_grp:=sum(x[period==0]),by=grp] assign(nnames[ii],dummy)}Would still like to understand what's going on, and perhaps a better way of assigning variables iteratively in situations like this. 解决方案 With 20-30 criteria, keeping them outside of a list (with manual names like dt2, etc.) is too clunky, so I'll just assume you have them all in dt_list. I suggest making tables with just the stat you're computing, and then rbinding them:xxt <- rbindlist(lapply(1:length(dt_list),function(i) dt_list[[i]][,list(cond=i,xx=sum(x[period==0])),by=grp]))which creates grp cond xx 1: 1 1 623.3448 2: 2 1 784.8438 3: 4 1 699.2362 4: 3 1 367.7196 5: 1 2 323.6268 6: 4 2 307.0374 7: 2 2 447.0753 8: 3 2 185.7377 9: 1 3 275.489710: 4 3 243.021411: 2 3 149.604112: 3 3 166.3626You can easily merge back if you really want those vars. For example, for dt2:myi = 2setkey(dt_list[[myi]],grp)[xxt[cond==myi,list(grp,xx)]]This doesn't resolve the bug you're running into, but I think is a better approach. 这篇关于通过`：`循环中的`：=`赋值（R data.table）的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！