一次计算多个列中的特定值

一次计算多个列中的特定值

本文介绍了dplyr,R:一次计算多个列中的特定值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框:

md <- data.frame(a = c(3,5,4,5,3,5), b = c(5,5,5,4,4,1), c = c(1,3,4,3,5,5),
      device = c(1,1,2,2,3,3))
myvars = c("a", "b", "c")
md[2,3] <- NA
md[4,1] <- NA
md

我要计算数字每列5s-按设备。我可以这样:

I want to count number of 5s in each column - by device. I can do it like this:

library(dplyr)
group_by(md, device) %>%
summarise(counts.a = sum(a==5, na.rm = T),
          counts.b = sum(b==5, na.rm = T),
          counts.c = sum(c==5, na.rm = T))

但是,实际上生活中我会有很多变量( myvars 的长度可能非常大)-这样我就无法指定那些 counts.a counts.b 等手动操作-数十次。

However, in real life I'll have tons of variables (the length of myvars can be very large) - so that I can't specify those counts.a, counts.b, etc. manually - dozens of times.

是否 dplyr 是否允许同时在所有 myvars 列上运行5s?

Does dplyr allow to run the count of 5s on all myvars columns at once?

谢谢!

推荐答案

如果您在乎以计数开头的名称。您可以在dplyr管道中这样做:

If you care about the names starting with "counts." you could do it like this in a dplyr pipe:

md %>%
  group_by(device) %>%
  summarise_each_(funs(sum(.==5,na.rm=TRUE)), myvars) %>%
  setNames(c(names(.)[1], paste0("counts.", myvars)))
#Source: local data frame [3 x 4]
#
#  device counts.a counts.b counts.c
#1      1        1        2        0
#2      2        0        1        0
#3      3        1        0        2

还有另一个关于如何命名dplyr的 mutate_each 产生的新列的问答(其行为与 summarise_each )在这里:dplyr中的。

There's another Q&A about how one can name new columns produced by dplyr's mutate_each (which behaves the same way as summarise_each) here: mutate_each in dplyr: how do I select certain columns and give new names to mutated columns?.

这篇关于dplyr,R:一次计算多个列中的特定值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-21 09:06