问题描述
我正在使用 dplyr
进行一些编程,并对如何将表达式传递为(特别是 MoreArgs
)参数 mapply
?
I'm doing some programming using dplyr
, and am curious about how to pass an expression as (specifically a MoreArgs
) argument to mapply
?
考虑一个简单函数 F
根据某些 ids
和 time_range 的子集
data.frame
code>,然后基于其他列 x
输出摘要统计信息。
Consider a simple function F
that subsets a data.frame
based on some ids
and a time_range
, then outputs a summary statistic based on some other column x
.
require(dplyr)
F <- function(ids, time_range, df, date_column, x) {
date_column <- enquo(date_column)
x <- enquo(x)
df %>%
filter(person_id %chin% ids) %>%
filter(time_range[1] <= (!!date_column) & (!!date_column) <= time_range[2]) %>%
summarise(newvar = sum(!!x))
}
我们可以构成一些示例数据,我们可以将函数 F
应用于其中。
We can make up some example data to which we can apply our function F
.
person_ids <- lapply(1:2, function(i) sample(letters, size = 10))
time_ranges <- lapply(list(c("2014-01-01", "2014-12-31"),
c("2015-01-01", "2015-12-31")), as.Date)
require(data.table)
dt <- CJ(person_id = letters,
date_col = seq.Date(from = as.Date('2014-01-01'), to = as.Date('2015-12-31'), by = '1 day'))
dt[, z := rnorm(nrow(dt))] # The variable we will later sum over, i.e. apply F to.
我们可以成功地将函数应用于每个输入。
We can successfully apply our function to each of our inputs.
F(person_ids[[1]], time_ranges[[1]], dt, date_col, z)
F(person_ids[[2]], time_ranges[[2]], dt, date_col, z)
如果我愿意,我可以编写一个简单的for循环来解决我的问题。但是,如果我们尝试应用语法糖并将所有内容包装在 mapply
中,则会出现错误。
And so if I wanted, I could write a simple for-loop to solve my problem. But if we try to apply syntactic sugar and wrap everything within mapply
, we get an error.
mapply(F, ids = person_ids, time_range = time_ranges, MoreArgs = list(df = dt, date_column = date_col, x = z))
# Error in mapply... object 'date_col' not found
推荐答案
在 mapply
, MoreArgs
作为列表提供,但是R尝试评估列表元素,从而导致错误。正如@Gregor所建议的那样,您可以 quote
我们不想立即评估的那些 MoreArgs
,以防止发生错误并允许该功能继续进行。可以使用基本 quote
或 dplyr
quo
完成:
In mapply
, MoreArgs
is provided as a list, but R tries to evaluate the list elements, causing the error. As suggested by @Gregor, you can quote
those MoreArgs
that we don't want to evaluate immediately, preventing the error and allowing the function to proceed. This can be done with base quote
or dplyr
quo
:
mapply(F, person_ids, time_ranges, MoreArgs = list(dt, quote(date_col), quote(z)))
mapply(F, person_ids, time_ranges, MoreArgs = list(dt, quo(date_col), quo(z)))
另一种选择是使用 purrr
软件包中的 map2
tidyverse
等价于 mapply
的两个输入向量。 tidyverse
函数被设置为与非标准评估一起使用,从而避免了使用 mapply
时出现的错误。需要引用参数:
Another option is to use map2
from the purrr
package, which is the tidyverse
equivalent of mapply
with two input vectors. tidyverse
functions are set up to work with non-standard evaluation, which avoids the error you're getting with mapply
without the need for quoting the arguments:
library(purrr)
map2(person_ids, time_ranges, F, dt, date_col, z)
[[1]]
newvar
1 40.23419
[[2]]
newvar
1 71.42327
通常,您可以使用 pmap
,可以在任意数量的输入向量上并行迭代:
More generally, you could use pmap
, which iterates in parallel over any number of input vectors:
pmap(list(person_ids, time_ranges), F, dt, date_col, z)
这篇关于将表达式传递到mapply的MoreArgs中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!