使用 dplyr group_by 模拟 split():返回数据帧列表

Note that split names each list by the name of the factor that has been used to establish this group - this is a desired function (ultimately, bonus kudos for a way to extract these names from the list of dfs).推荐答案group_split in dplyr:Dplyr 已经实现了 group_split:https://dplyr.tidyverse.org/reference/group_split.htmlDplyr has implemented group_split:https://dplyr.tidyverse.org/reference/group_split.html它按组拆分数据帧，返回数据帧列表.这些数据帧中的每一个都是由拆分变量的类别定义的原始数据帧的子集.It splits a dataframe by a groups, returns a list of dataframes. Each of these dataframes are subsets of the original dataframes defined by categories of the splitting variable.例如.用变量Species分割数据集iris，并计算每个子数据集的摘要:For example. Split the dataset iris by the variable Species, and calculate summaries of each sub-dataset:> iris %>%+ group_split(Species) %>%+ map(summary)[[1]] Sepal.Length Sepal.Width Petal.Length Petal.Width Species Min. :4.300 Min. :2.300 Min. :1.000 Min. :0.100 setosa :50 1st Qu.:4.800 1st Qu.:3.200 1st Qu.:1.400 1st Qu.:0.200 versicolor: 0 Median :5.000 Median :3.400 Median :1.500 Median :0.200 virginica : 0 Mean :5.006 Mean :3.428 Mean :1.462 Mean :0.246 3rd Qu.:5.200 3rd Qu.:3.675 3rd Qu.:1.575 3rd Qu.:0.300 Max. :5.800 Max. :4.400 Max. :1.900 Max. :0.600[[2]] Sepal.Length Sepal.Width Petal.Length Petal.Width Species Min. :4.900 Min. :2.000 Min. :3.00 Min. :1.000 setosa : 0 1st Qu.:5.600 1st Qu.:2.525 1st Qu.:4.00 1st Qu.:1.200 versicolor:50 Median :5.900 Median :2.800 Median :4.35 Median :1.300 virginica : 0 Mean :5.936 Mean :2.770 Mean :4.26 Mean :1.326 3rd Qu.:6.300 3rd Qu.:3.000 3rd Qu.:4.60 3rd Qu.:1.500 Max. :7.000 Max. :3.400 Max. :5.10 Max. :1.800[[3]] Sepal.Length Sepal.Width Petal.Length Petal.Width Species Min. :4.900 Min. :2.200 Min. :4.500 Min. :1.400 setosa : 0 1st Qu.:6.225 1st Qu.:2.800 1st Qu.:5.100 1st Qu.:1.800 versicolor: 0 Median :6.500 Median :3.000 Median :5.550 Median :2.000 virginica :50 Mean :6.588 Mean :2.974 Mean :5.552 Mean :2.026 3rd Qu.:6.900 3rd Qu.:3.175 3rd Qu.:5.875 3rd Qu.:2.300 Max. :7.900 Max. :3.800 Max. :6.900 Max. :2.500它对于调试嵌套数据帧上的计算也非常有帮助，因为它是查看"嵌套数据帧内部"计算中发生的事情的快速方法.It is also very helpful for debugging a calculations on nested dataframes, because it is an quick way to "see" what is going on "inside" the calculations on nested dataframes. 这篇关于使用 dplyr group_by 模拟 split():返回数据帧列表的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！上岸，阿里云！