问题描述
无法弄清楚如何在使用 dplyr
R
包的函数中使用列名。可重现的示例如下:
Not able to figure out how to use column names in a function using dplyr
R
package. Reproducible example is below:
数据
set.seed(12345)
Y <- rnorm(10)
Env <- paste0("E", rep(1:2, each = 5))
Gen <- paste0("G", rep(1:5, times = 2))
df1 <- data.frame(Y, Env, Gen)
外部功能工作
library(dplyr)
df1 %>%
dplyr::group_by(E, G) %>%
dplyr::summarize(mean(Y))
with(data = df1, expr = tapply(X = Y, INDEX = list(E, G), FUN = mean))
第一个函数
fn1 <- function(Y, E, G, data){
Y <- deparse(substitute(Y))
E <- deparse(substitute(E))
G <- deparse(substitute(G))
Out <- with(data = data, tapply(X = Y, INDEX = list(E, G), FUN = mean), parent.frame())
return(Out)
}
fn1(Y = Y, E = Env, G = Gen, data = df1)
第二功能
fn2 <- function(Y, E, G, data){
Y <- deparse(substitute(Y))
E <- deparse(substitute(E))
G <- deparse(substitute(G))
library(dplyr)
Out <- df1 %>%
dplyr::group_by(E, G) %>%
dplyr::summarize(mean(Y))
return(Out)
}
fn2(Y = Y, E = Env, G = Gen, data = df1)
推荐答案
一种选择是使用 enquo
在 quosure $中捕获表达式及其环境。 c $ c>对象,可以在
group_by
,摘要
,变异
等,可使用 !!
运算符或 UQ
(unquote expression)
One option would to use the enquo
to capture the expression and its environment in a quosure
object which can be evaluated within the group_by
, summarise
, mutate
etc by using !!
operator or UQ
(unquote expression)
fn2 <- function(Y, E, G, data){
E <- enquo(E)
G <- enquo(G)
Y <- enquo(Y)
data %>%
dplyr::group_by(!! E, !! G) %>%
dplyr::summarize(Y = mean(!!Y))
}
fn2(Y, E = Env, G = Gen, df1)
# A tibble: 10 x 3
# Groups: Env [?]
# Env Gen Y
# <fctr> <fctr> <dbl>
# 1 E1 G1 0.586
# 2 E1 G2 0.709
# 3 E1 G3 -0.109
# 4 E1 G4 -0.453
# 5 E1 G5 0.606
# 6 E2 G1 -1.82
# 7 E2 G2 0.630
# 8 E2 G3 -0.276
# 9 E2 G4 -0.284
#10 E2 G5 -0.919
在Op的函数中,表达式由<$ c捕获$ c>替代,并用删除
将其转换为字符串。通过使用 rlang
中的 sym
,可以将其转换为符号,然后使用进行评估!!
或 UQ
In the Op's function, while the expression is captured by substitute
, with deparse
, it is converted to a string. By using sym
from rlang
, this can be converted to symbol and then evaluated with !!
or UQ
as above
fn2 <- function(Y, E, G, data){
Y <- deparse(substitute(Y))
E <- deparse(substitute(E))
G <- deparse(substitute(G))
df1 %>%
dplyr::group_by(!!rlang::sym(E), !! rlang::sym(G)) %>%
dplyr::summarize(Y = mean(!! rlang::sym(Y)))
}
fn2(Y = Y, E = Env, G = Gen, data = df1)
OP函数的另一个变体而不使用 rlang
将使用 group_by_at
或 summarise_at
可以将字符串作为参数
Another variant of the OP's function without using rlang
would be to make use of group_by_at
or summarise_at
which can take strings as argument
fn3 <- function(Y, E, G, data){
Y <- deparse(substitute(Y))
E <- deparse(substitute(E))
G <- deparse(substitute(G))
df1 %>%
dplyr::group_by_at(vars(E, G)) %>%
dplyr::summarize_at(vars(Y), mean)
}
fn3(Y = Y, E = Env, G = Gen, data = df1)
这篇关于具有函数列名称的dplyr的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!