r - 我什么时候应该使用 "which"进行子集化？

这是一个玩具示例。

 iris %>%
  group_by(Species) %>%
  summarise(max = Sepal.Width[Sepal.Length == max(Sepal.Length)])

 # A tibble: 3 x 2
  Species      max
  <fct>      <dbl>
1 setosa       4
2 versicolor   3.2
3 virginica    3.8

使用 which() 时，它提供相同的输出。

iris %>%
  group_by(Species) %>%
  summarise(max = Sepal.Width[which(Sepal.Length == max(Sepal.Length))])
# summarise(max = Sepal.Width[which.max(Sepal.Length)])

# A tibble: 3 x 2
  Species      max
  <fct>      <dbl>
1 setosa       4
2 versicolor   3.2
3 virginica    3.8

help(which) 说:

== 做同样的事情:显示 TRUE & FALSE

那么 which() 什么时候对子集有用呢？

最佳答案

由于这个问题是专门关于子集的，我想我会
说明使用 which() 的一些性能优势
链接问题中提出的逻辑子集。

当你想提取整个子集时，没有太大的区别
处理速度，但使用 which() 需要 allocate less memory 。然而，if you only want a part of the subset(例如展示一些奇怪的
结果)，which() 具有显着的速度和内存优势，因为
能够通过对结果进行子集化来避免对数据帧进行两次子集化which() 代替。

以下是基准:

df <- ggplot2::diamonds; dim(df)
#> [1] 53940    10
mu <- mean(df$price)

bench::press(
  n = c(sum(df$price > mu), 10),
  {
    i <- seq_len(n)
    bench::mark(
      logical = df[df$price > mu, ][i, ],
      which_1 = df[which(df$price > mu), ][i, ],
      which_2 = df[which(df$price > mu)[i], ]
    )
  }
)
#> Running with:
#>       n
#> 1 19657
#> 2    10
#> # A tibble: 6 x 11
#>   expression     n      min     mean   median      max `itr/sec` mem_alloc
#>   <chr>      <dbl> <bch:tm> <bch:tm> <bch:tm> <bch:tm>     <dbl> <bch:byt>
#> 1 logical    19657    1.5ms   1.81ms   1.71ms   3.39ms      553.     5.5MB
#> 2 which_1    19657   1.41ms   1.61ms   1.56ms   2.41ms      620.    2.89MB
#> 3 which_2    19657 826.56us 934.72us 910.88us   1.41ms     1070.    1.76MB
#> 4 logical       10 893.12us   1.06ms   1.02ms   1.93ms      941.    4.21MB
#> 5 which_1       10  814.4us 944.81us 908.16us   1.78ms     1058.    1.69MB
#> 6 which_2       10 230.72us 264.45us 249.28us   1.08ms     3781.  498.34KB
#> # ... with 3 more variables: n_gc <dbl>, n_itr <int>, total_time <bch:tm>

由 reprex package (v0.2.0) 于 2018 年 8 月 19 日创建。

关于r - 我什么时候应该使用 "which"进行子集化？，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/51914297/