问题描述
我想使用 dplyr
参数化以下计算,以找出 Sepal.Length
的哪些值与多个 Sepal.Width:
I want to parameterise the following computation using
dplyr
that finds which values of Sepal.Length
are associated with more than one value of Sepal.Width
:
library(dplyr)
iris %>%
group_by(Sepal.Length) %>%
summarise(n.uniq=n_distinct(Sepal.Width)) %>%
filter(n.uniq > 1)
通常我会这样写:
not.uniq.per.group <- function(data, group.var, uniq.var) {
iris %>%
group_by(group.var) %>%
summarise(n.uniq=n_distinct(uniq.var)) %>%
filter(n.uniq > 1)
}
然而,这种方法会引发错误,因为
dplyr
使用了 non-标准评价.这个函数应该怎么写?
However, this approach throws errors because
dplyr
uses non-standard evaluation. How should this function be written?
推荐答案
您需要使用
dplyr
函数的标准评估版本(只需在函数名称后附加_",即.group_by_
& summarise_
) 并将字符串传递给您的函数,然后您需要将其转换为符号.要参数化 summarise_ 的参数,您需要使用 interp()
,它在 lazyeval
包中定义.具体:
You need to use the standard evaluation versions of the
dplyr
functions (just append '_' to the function names, ie. group_by_
& summarise_
) and pass strings to your function, which you then need to turn into symbols. To parameterise the argument of summarise_, you will need to use interp()
, which is defined in the lazyeval
package. Concretely:
library(dplyr)
library(lazyeval)
not.uniq.per.group <- function(df, grp.var, uniq.var) {
df %>%
group_by_(grp.var) %>%
summarise_( n_uniq=interp(~n_distinct(v), v=as.name(uniq.var)) ) %>%
filter(n_uniq > 1)
}
not.uniq.per.group(iris, "Sepal.Length", "Sepal.Width")
请注意,在最新版本的
dplyr
中,dplyr 函数的标准评估版本是 "软弃用" 支持非标准评估.
Note that in recent versions of
dplyr
the standard evaluation versions of the dplyr functions have been "soft deprecated" in favor of non-standard evaluation.
请参阅使用
dplyr
小插图 进行编程以了解更多信息处理非标准评估.
See the Programming with
dplyr
vignette for more information on working with non-standard evaluation.
这篇关于将参数传递给 dplyr 函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!