鉴于dplyr工作流程:

require(dplyr)
mtcars %>%
    tibble::rownames_to_column(var = "model") %>%
    filter(grepl(x = model, pattern = "Merc")) %>%
    group_by(am) %>%
    summarise(meanMPG = mean(mpg))


我对根据filter的值有条件地应用applyFilter感兴趣。



对于applyFilter <- 1,使用"Merc"字符串过滤行,不使用过滤器,则返回所有行。

applyFilter <- 1


mtcars %>%
  tibble::rownames_to_column(var = "model") %>%
  filter(model %in%
           if (applyFilter) {
             rownames(mtcars)[grepl(x = rownames(mtcars), pattern = "Merc")]
           } else
           {
             rownames(mtcars)
           }) %>%
  group_by(am) %>%
  summarise(meanMPG = mean(mpg))


问题

由于始终评估ifelse调用,因此建议的解决方案效率很低。一种更理想的方法将仅对filter评估applyFilter <- 1步骤。

尝试

低效的工作解决方案如下所示:

mtcars %>%
    tibble::rownames_to_column(var = "model") %>%
    # Only apply filter step if condition is met
    if (applyFilter) {
        filter(grepl(x = model, pattern = "Merc"))
        }
    %>%
    # Continue
    group_by(am) %>%
    summarise(meanMPG = mean(mpg))


自然,上面的语法是不正确的。这仅是理想工作流外观的一个说明。



期望的答案


我对创建临时对象不感兴趣;工作流程应类似于:

startingObject
    %>%
    ...
    conditional filter
    ...
    final object

理想情况下,我想得出一个解决方案,在该解决方案中,我可以控制是否评估filter调用

最佳答案

这种方法怎么样:

mtcars %>%
    tibble::rownames_to_column(var = "model") %>%
    filter(if(applyfilter== 1) grepl(x = model, pattern = "Merc") else TRUE) %>%
    group_by(am) %>%
    summarise(meanMPG = mean(mpg))


这意味着仅当applyfilter为1时才评估grepl,否则filter只是回收TRUE



另一个选择是使用{}

mtcars %>%
  tibble::rownames_to_column(var = "model") %>%
  {if(applyfilter == 1) filter(., grepl(x = model, pattern = "Merc")) else .} %>%
  group_by(am) %>%
  summarise(meanMPG = mean(mpg))




显然还有另一种可能的方法,您可以简单地断开管道,有条件地进行过滤,然后继续管道(我知道OP并没有要求这样做,只是想给其他读者一个例子)

mtcars %<>%
  tibble::rownames_to_column(var = "model")

if(applyfilter == 1) mtcars %<>% filter(grepl(x = model, pattern = "Merc"))

mtcars %>%
  group_by(am) %>%
  summarise(meanMPG = mean(mpg))

07-24 16:38