问题描述
我有一个数据框 md:
I have a data frame md:
md <- data.frame(x = c(3,5,4,5,3,5), y = c(5,5,5,4,4,1), z = c(1,3,4,3,5,5),
device1 = c("c","a","a","b","c","c"), device2 = c("B","A","A","A","B","B"))
md[2,3] <- NA
md[4,1] <- NA
md
我想使用 dplyr 通过 device1/device2 组合计算均值:
I want to calculate means by device1 / device2 combinations using dplyr:
library(dplyr)
md %>% group_by(device1, device2) %>% summarise_each(funs(mean))
但是,我得到了一些 NA.我希望忽略 NA (na.rm = TRUE) - 我试过了,但函数不想接受这个参数.这两行都会导致错误:
However, I am getting some NAs. I want the NAs to be ignored (na.rm = TRUE) - I tried, but the function doesn't want to accept this argument.Both these lines result in error:
md %>% group_by(device1, device2) %>% summarise_each(funs(mean), na.rm = TRUE)
md %>% group_by(device1, device2) %>% summarise_each(funs(mean, na.rm = TRUE))
推荐答案
其他答案向您展示了将 mean(., na.rm = TRUE)
传递到 summarize/_each 的语法
.
The other answers showed you the syntax for passing mean(., na.rm = TRUE)
into summarize/_each
.
就我个人而言,我经常处理这个问题,这太烦人了,我只是定义了以下方便的 NA 感知基本函数集(例如在我的 .Rprofile 中),这样您就可以应用它们使用带有 summarize(mean_)
的 dplyr 并且没有讨厌的 arg-passing;还使源代码更清晰、更具可读性,这是另一个强大的优势:
Personally, I deal with this so often and it's so annoying that I just define the following convenience set of NA-aware basic functions (e.g. in my .Rprofile), such that you can apply them with dplyr with summarize(mean_)
and no pesky arg-passing; also keeps the source-code cleaner and more readable, which is another strong plus:
mean_ <- function(...) mean(..., na.rm=T)
median_ <- function(...) median(..., na.rm=T)
sum_ <- function(...) sum(..., na.rm=T)
sd_ <- function(v) sqrt(sum_((v-mean_(v))^2) / length(v))
cor_ <- function(...) cor(..., use='pairwise.complete.obs')
max_ <- function(...) max(..., na.rm=T)
min_ <- function(...) min(..., na.rm=T)
pmax_ <- function(...) pmax(..., na.rm=T)
pmin_ <- function(...) pmin(..., na.rm=T)
table_ <- function(...) table(..., useNA='ifany')
mode_ <- function(...) {
tab <- table(...)
names(tab[tab==max(tab)]) # the '==' implicitly excludes NA values
}
clamp_ <- function(..., minval=0, maxval=70) pmax(minval, pmin(maxval,...))
你真的希望能够一劳永逸地轻弹一个全局开关,比如 na.action/na.pass/na.omit/na.fail
告诉函数作为默认行为要做什么做,而不是像目前那样在不同的包中抛出错误或不一致.
Really you want to be able to flick one global switch once and for all, like na.action/na.pass/na.omit/na.fail
to tell functions as default behavior what to do, and not throw errors or be inconsistent, as they currently do, across different packages.
曾经有一个名为 Defaults
的 CRAN 包,用于设置每个功能的默认值,但自 2014 年 3.x 之前就不再维护.有关更多信息在项目特定的基础上设置函数默认值 R
There used to be a CRAN package called Defaults
for setting per-function defaults but it is not maintained since 2014, pre-3.x . For more about it Setting Function Defaults R on a Project Specific Basis
这篇关于在 group_by 上计算均值 (summarize_each) 时处理 NA的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!