本文介绍了可以dplyr总结几个变量,而不列出每一个?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述 dplyr是惊人的快,但我想知道我是否缺少一些东西:是可能总结几个变量。例如: 库(dplyr)库(reshape2) (df = dput(structure(list(sex = structure(c(1L,1L,2L,2L),.Label = c(boy,girl),class =factor),age = c 52L,58L,40L,62L),bmi = c(25L, 23L,30L,26L),chol = c(187L,220L,190L,204L)).Names = c(sex b $ bage,bmi,chol),row.names = c(NA,-4L),class =data.frame)) sex age bmi chol 1男孩52 25 187 2男孩58 23 220 3女孩40 30 190 4女孩62 26 204 dg = group_by(df,性别) 使用这个小型数据机,很容易写 summarize(dg,mean(age),mean(bmi),mean(chol)) 我知道,为了得到我想要的,我可以融化,得到手段,然后dcast如 dm = melt(df,id.var ='sex') dmg = group_by(dm,sex,variable); x = summarize(dmg,means = mean(value)) dcast(x,sex〜variable) 但是如果我有> 20个变量和非常大量的行。在data.table中有什么类似于.SD的东西,这将允许我采取分组数据框架中所有变量的方法? 感谢任何帮助解决方案 data.table idiom是 lapply(.SD,mean)是 DT DT [,lapply(.SD,mean) by = sex] #sex age bmi chol #1:boy 55 24 203.5 #2:girl 51 28 197.0 我不确定 dplyr 同义词,但你可以做类似 dg #要汇总的列的名称 cols < - names(dg)[ - 1] #调用总结的点组件 dots< - sapply(cols,function(x)substitute(mean(x),list = as.name(x)))) do.call(summarize,c(list(.data = dg),dots))#Source:local data frame [2 x 4] #sex age bmi chol #1 boy 55 24 203.5 #2 girl 51 28 197.0 请注意,有一个github问题#178 有效地实现 dplyr 中的 plyr idiom colwise / p> dplyr is amazingly fast, but I wonder if I'm missing something: is it possible summarise over several variables. For example: library(dplyr)library(reshape2)(df=dput(structure(list(sex = structure(c(1L, 1L, 2L, 2L), .Label = c("boy","girl"), class = "factor"), age = c(52L, 58L, 40L, 62L), bmi = c(25L,23L, 30L, 26L), chol = c(187L, 220L, 190L, 204L)), .Names = c("sex","age", "bmi", "chol"), row.names = c(NA, -4L), class = "data.frame"))) sex age bmi chol1 boy 52 25 1872 boy 58 23 2203 girl 40 30 1904 girl 62 26 204dg=group_by(df,sex)With this small dataframe, it's easy to writesummarise(dg,mean(age),mean(bmi),mean(chol))And I know that to get what I want, I could melt, get the means, and then dcast such as dm=melt(df, id.var='sex')dmg=group_by(dm, sex, variable);x=summarise(dmg, means=mean(value))dcast(x, sex~variable)But what if I have >20 variables and a very large number of rows. Is there anything similar to .SD in data.table that would allow me to take the means of all variables in the grouped data frame? Or, is it possible to somehow use lapply on the grouped data frame?Thanks for any help 解决方案 The data.table idiom is lapply(.SD, mean), which is DT <- data.table(df)DT[, lapply(.SD, mean), by = sex]# sex age bmi chol# 1: boy 55 24 203.5# 2: girl 51 28 197.0I'm not sure of a dplyr idiom for the same thing, but you can do something likedg <- group_by(df, sex)# the names of the columns you want to summarizecols <- names(dg)[-1]# the dots component of your call to summarisedots <- sapply(cols ,function(x) substitute(mean(x), list(x=as.name(x))))do.call(summarise, c(list(.data=dg), dots))# Source: local data frame [2 x 4]# sex age bmi chol# 1 boy 55 24 203.5# 2 girl 51 28 197.0Note that there is a github issue #178 to efficienctly implement the plyr idiom colwise in dplyr. 这篇关于可以dplyr总结几个变量,而不列出每一个?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!
09-14 09:14