问题描述
我有一个数据框:
md <- data.frame(a = c(3,5,4,5,3,5), b = c(5,5,5,4,4,1), c = c(1,3,4,3,5,5),
device = c(1,1,2,2,3,3))
myvars = c("a", "b", "c")
md[2,3] <- NA
md[4,1] <- NA
md
我要计算数字每列5s-按设备。我可以这样:
I want to count number of 5s in each column - by device. I can do it like this:
library(dplyr)
group_by(md, device) %>%
summarise(counts.a = sum(a==5, na.rm = T),
counts.b = sum(b==5, na.rm = T),
counts.c = sum(c==5, na.rm = T))
但是,实际上生活中我会有很多变量( myvars
的长度可能非常大)-这样我就无法指定那些 counts.a
, counts.b
等手动操作-数十次。
However, in real life I'll have tons of variables (the length of myvars
can be very large) - so that I can't specify those counts.a
, counts.b
, etc. manually - dozens of times.
是否 dplyr
是否允许同时在所有 myvars
列上运行5s?
Does dplyr
allow to run the count of 5s on all myvars
columns at once?
谢谢!
推荐答案
如果您在乎以计数开头的名称。您可以在dplyr管道中这样做:
If you care about the names starting with "counts." you could do it like this in a dplyr pipe:
md %>%
group_by(device) %>%
summarise_each_(funs(sum(.==5,na.rm=TRUE)), myvars) %>%
setNames(c(names(.)[1], paste0("counts.", myvars)))
#Source: local data frame [3 x 4]
#
# device counts.a counts.b counts.c
#1 1 1 2 0
#2 2 0 1 0
#3 3 1 0 2
还有另一个关于如何命名dplyr的 mutate_each
产生的新列的问答(其行为与 summarise_each
)在这里:dplyr中的。
There's another Q&A about how one can name new columns produced by dplyr's mutate_each
(which behaves the same way as summarise_each
) here: mutate_each in dplyr: how do I select certain columns and give new names to mutated columns?.
这篇关于dplyr,R:一次计算多个列中的特定值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!