本文介绍了在data.table中使用eval的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述 29岁程序员,3月因学历无情被辞! 使用以下data.table: $ 我尝试将data.table中eval的行为理解为框架 b $ b set.seed(1) foo = data.table(var1 = sample(1:3,1000,r = T) var2 = rnorm(1000),var3 = sample(letters [1:5],1000,replace = T)) $ b b 我试图复制此指令 foo [var1 == 1,sum(var2) var b] $ b b b b b pre> eval1 = function(s)eval(parse(text = s),envir = sys.parent()) 正如你所看到的,测试1和3是工作,但我不明白在eval中为test 2设置的正确环境: var_i =var1 var_j =var2 var_by =var3 $ b b#test 1 works foo [eval1(var_i)== 1,sum(var2),by = var3] #test 2不工作 foo [var1 == 1,sum(eval1(var_j)),by = var3] #test 3 works foo [var1 == 1,sum(var2),by = eval1(var_by) ] 解决方案 j-exp 在 .SD 的环境中检查其变量,它代表数据子集。 .SD 本身是一个 data.table ,它包含该组的列。 执行以下操作时: foo [var1 == 1,sum (eval(parse(text = var_j))),by = var3] c $ c> j-exp 获得内部优化/替换为 sum(var2)。但 sum(eval1(var_j))没有得到优化,并保持原样。 然后当它对每个组求值时,它必须找到 var2 在调用函数的parent.frame()中,但在 .SD 中。作为示例,让我们这样做: eval1< - function(s)eval(parse(text = s),envir = parent.frame()) foo [var1 == 1,{var2 = 1L; eval1(var_j)},by = var3] #var3 V1 #1:e 1 #2:c 1 #3:a 1 #4 :b 1 #5:d 1 找到 var2 从它的父框架。也就是说,我们必须指向正确的环境来评估,有一个额外的参数值= .SD 。 eval1 foo [var1 == 1,sum(eval1(var_j,.SD)),by = var3] #var3 V1 #1:e 11.178035 #2:c -12.236446 # 3:a -8.984715 #4:b -2.739386 #5:d -1.159506 I'm trying to understand the behaviour of eval in a data.table as a "frame".With following data.table:set.seed(1)foo = data.table(var1=sample(1:3,1000,r=T), var2=rnorm(1000), var3=sample(letters[1:5],1000,replace = T))I'm trying to replicate this instructionfoo[var1==1 , sum(var2) , by=var3]using a function of eval:eval1 = function(s) eval( parse(text=s) ,envir=sys.parent() )As you can see, test 1 and 3 are working, but I don't understand which is the "correct" envir to set in eval for test 2:var_i="var1"var_j="var2"var_by="var3"# test 1 worksfoo[eval1(var_i)==1 , sum(var2) , by=var3 ]# test 2 doesn't workfoo[var1==1 , sum(eval1(var_j)) , by=var3]# test 3 worksfoo[var1==1 , sum(var2) , by=eval1(var_by)] 解决方案 The j-exp, checks for it's variables in the environment of .SD, which stands for Subset of Data. .SD is itself a data.table that holds the columns for that group.When you do: foo[var1 == 1, sum(eval(parse(text=var_j))), by=var3]directly, the j-exp gets internally optimised/replaced to sum(var2). But sum(eval1(var_j)) doesn't get optimised, and stays as it is.Then when it gets evaluated for each group, it'll have to find var2, which doesn't exist in the parent.frame() from where the function is called, but in .SD. As an example, let's do this:eval1 <- function(s) eval(parse(text=s), envir=parent.frame())foo[var1 == 1, { var2 = 1L; eval1(var_j) }, by=var3]# var3 V1# 1: e 1# 2: c 1# 3: a 1# 4: b 1# 5: d 1It find var2 from it's parent frame. That is, we have to point to the right environment to evaluate in, with an additional argument with value = .SD.eval1 <- function(s, env) eval(parse(text=s), envir = env, enclos = parent.frame())foo[var1 == 1, sum(eval1(var_j, .SD)), by=var3]# var3 V1# 1: e 11.178035# 2: c -12.236446# 3: a -8.984715# 4: b -2.739386# 5: d -1.159506 这篇关于在data.table中使用eval的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持! 上岸,阿里云!
08-14 19:54