我想将数据帧分成多列,以便可以看到每个数据子集的summary()输出。

这是使用split()中的base做到这一点的方法:

library(tidyverse)
#> Loading tidyverse: ggplot2
#> Loading tidyverse: tibble
#> Loading tidyverse: tidyr
#> Loading tidyverse: readr
#> Loading tidyverse: purrr
#> Loading tidyverse: dplyr
#> Conflicts with tidy packages ----------------------------------------------
#> filter(): dplyr, stats
#> lag():    dplyr, stats

mtcars %>%
  select(1:3) %>%
  mutate(GRP_A = sample(LETTERS[1:2], n(), replace = TRUE),
         GRP_B = sample(c(1:2), n(), replace = TRUE)) %>%
  split(list(.$GRP_A, .$GRP_B)) %>%
  map(summary)
#> $A.1
#>       mpg             cyl           disp          GRP_A
#>  Min.   :10.40   Min.   :4.0   Min.   :108.0   Length:10
#>  1st Qu.:14.97   1st Qu.:4.5   1st Qu.:151.9   Class :character
#>  Median :18.50   Median :7.0   Median :259.3   Mode  :character
#>  Mean   :17.61   Mean   :6.4   Mean   :283.4
#>  3rd Qu.:20.85   3rd Qu.:8.0   3rd Qu.:430.0
#>  Max.   :24.40   Max.   :8.0   Max.   :472.0
#>      GRP_B
#>  Min.   :1
#>  1st Qu.:1
#>  Median :1
#>  Mean   :1
#>  3rd Qu.:1
#>  Max.   :1
#>
#> $B.1
#>       mpg             cyl           disp          GRP_A
#>  Min.   :15.00   Min.   :4.0   Min.   : 75.7   Length:5
#>  1st Qu.:21.00   1st Qu.:4.0   1st Qu.: 78.7   Class :character
#>  Median :21.50   Median :4.0   Median :120.1   Mode  :character
#>  Mean   :24.06   Mean   :5.2   Mean   :147.1
#>  3rd Qu.:30.40   3rd Qu.:6.0   3rd Qu.:160.0
#>  Max.   :32.40   Max.   :8.0   Max.   :301.0
#>      GRP_B
#>  Min.   :1
#>  1st Qu.:1
#>  Median :1
#>  Mean   :1
#>  3rd Qu.:1
#>  Max.   :1
#>
#> $A.2
#>       mpg             cyl             disp          GRP_A
#>  Min.   :15.20   Min.   :4.000   Min.   : 95.1   Length:9
#>  1st Qu.:16.40   1st Qu.:6.000   1st Qu.:160.0   Class :character
#>  Median :18.10   Median :8.000   Median :275.8   Mode  :character
#>  Mean   :19.84   Mean   :6.667   Mean   :234.0
#>  3rd Qu.:21.00   3rd Qu.:8.000   3rd Qu.:275.8
#>  Max.   :30.40   Max.   :8.000   Max.   :360.0
#>      GRP_B
#>  Min.   :2
#>  1st Qu.:2
#>  Median :2
#>  Mean   :2
#>  3rd Qu.:2
#>  Max.   :2
#>
#> $B.2
#>       mpg             cyl         disp          GRP_A
#>  Min.   :13.30   Min.   :4   Min.   : 71.1   Length:8
#>  1st Qu.:14.97   1st Qu.:4   1st Qu.:125.3   Class :character
#>  Median :20.55   Median :6   Median :201.5   Mode  :character
#>  Mean   :20.99   Mean   :6   Mean   :213.5
#>  3rd Qu.:23.93   3rd Qu.:8   3rd Qu.:315.5
#>  Max.   :33.90   Max.   :8   Max.   :360.0
#>      GRP_B
#>  Min.   :2
#>  1st Qu.:2
#>  Median :2
#>  Mean   :2
#>  3rd Qu.:2
#>  Max.   :2

如何使用tidyverse动词达到相同的结果?我最初的想法是使用purrr::by_slice(),但是显然已经弃用了。

最佳答案

dplyr 0.8.0引入了您要查找的动词:group_split()
从文档中:



例如:

mtcars %>%
  select(1:3) %>%
  mutate(GRP_A = sample(LETTERS[1:2], n(), replace = TRUE),
         GRP_B = sample(c(1:2), n(), replace = TRUE)) %>%
  group_split(GRP_A, GRP_B) %>%
  map(summary)

关于r - 用多列拆分df的“tidyverse”方法是什么?,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/42704919/

10-12 23:14