从data.table聚合返回多个列

本文介绍了从data.table聚合返回多个列的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述 29岁程序员，3月因学历无情被辞！我想使用 data.table 替代 aggregate（）或 ddply（），因为这两种方法不能像希望的那样有效地缩放到大对象。不幸的是，我还没有搞清楚如何获得向量返回聚合函数在 data.table 的结果中生成多个列。例如：所需的包库（plyr）库（data.table）＃模拟数据x ＃ddply输出我想从data.table获得 ddply（data.frame（x），'g'，function（i）quantile（i $ value）） g 0％25％50％75％100％ 1 a -1.547495 -0.7842795 0.202456288 0.6098762 2.223530 2 b -1.366937 -0.4418388 -0.085876995 0.7826863 2.236469 3 c -2.064510 -0.6411390 -0.257526983 0.3213343 1.039053 4 d -1.773933 -0.5493362 -0.007549273 0.4835467 2.116601 5 e -0.780976 -0.2315245 0.194869630 0.6698881 2.207800 ＃不是我要找的：x [ ，quantile（value），by = g] g V1 1：a -1.547495345 2：a -0.784279536 3：a 0.202456288 4 ：a 0.609876241 5：a 2.223529739 6：b -1.366937074 7：b -0.441838791 8：b -0.085876995 9：b 0.782686277 10：b 2.236468703 基本上， ddply 和 aggregate 都是宽格式，而 data.table 的输出是长格式。解决方案 / div> 尝试强制转到列表： > x [，as.list（quantile（value）），by = g] g 0％25％50％75％100％ 1：a -1.7507334 -0.632331909 0.07435249 0.7459778 1.428552 2 ：b -2.2043481 -0.005652353 0.10534325 0.5769475 1.241754 3：c -1.9313985 -1.120737610 -0.26116926 0.6953009 1.360017 4：d -0.7434664 -0.055232431 0.22062823 1.1864389 3.021124 5：e -2.0101657 -0.468674094 0.20209610 0.6286448 2.433152 I would like to use data.table as an alternative to aggregate() or ddply(), as these two methods aren't scaling to large objects as efficiently as hoped. Unfortunately, I haven't figured out how to get vector-returning aggregate functions to generate multiple columns in the result from data.table. For example:# required packageslibrary(plyr)library(data.table)# simulated datax <- data.table(value=rnorm(100), g=rep(letters[1:5], each=20))# ddply output that I would like to get from data.tableddply(data.frame(x), 'g', function(i) quantile(i$value)) g 0% 25% 50% 75% 100% 1 a -1.547495 -0.7842795 0.202456288 0.6098762 2.223530 2 b -1.366937 -0.4418388 -0.085876995 0.7826863 2.236469 3 c -2.064510 -0.6411390 -0.257526983 0.3213343 1.039053 4 d -1.773933 -0.5493362 -0.007549273 0.4835467 2.116601 5 e -0.780976 -0.2315245 0.194869630 0.6698881 2.207800# not quite what I am looking for:x[, quantile(value), by=g]g V11: a -1.5474953452: a -0.7842795363: a 0.2024562884: a 0.6098762415: a 2.2235297396: b -1.3669370747: b -0.4418387918: b -0.0858769959: b 0.78268627710: b 2.236468703Essentially, the output from ddply and aggregate are in wide-format, while the output from the data.table is in long format. Is the answer reshaping the data, or some additional arguments to my data.table object? 解决方案 Try coercing to a list:> x[, as.list(quantile(value)), by=g] g 0% 25% 50% 75% 100%1: a -1.7507334 -0.632331909 0.07435249 0.7459778 1.4285522: b -2.2043481 -0.005652353 0.10534325 0.5769475 1.2417543: c -1.9313985 -1.120737610 -0.26116926 0.6953009 1.3600174: d -0.7434664 -0.055232431 0.22062823 1.1864389 3.0211245: e -2.0101657 -0.468674094 0.20209610 0.6286448 2.433152 这篇关于从data.table聚合返回多个列的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！上岸，阿里云！