问题描述
使用包 dplyr 和函数 sample_frac
可以从每个组中采样一个百分比.我需要的是先对每组中的元素进行排序,然后从每组中选择前 x%?
Using the package dplyr and the function sample_frac
it is possible to sample a percentage from every group. What I need is to first sort the elements in every group and then select top x% from every group?
有一个函数top_n
,但是这里我只能确定行数,需要一个相对值.
There is a function top_n
, but here I can only determine the number of rows, and I need a relative value.
例如以下数据按齿轮分组,并在每组内按wt
排序:
For example the following data is grouped by gear and sorted by wt
within each group:
library(dplyr)
mtcars %>%
select(gear, wt) %>%
group_by(gear) %>%
arrange(gear, wt)
gear wt
1 3 2.465
2 3 3.215
3 3 3.435
4 3 3.440
5 3 3.460
6 3 3.520
7 3 3.570
8 3 3.730
9 3 3.780
10 3 3.840
11 3 3.845
12 3 4.070
13 3 5.250
14 3 5.345
15 3 5.424
16 4 1.615
17 4 1.835
18 4 1.935
19 4 2.200
20 4 2.320
21 4 2.620
22 4 2.780
23 4 2.875
24 4 3.150
25 4 3.190
26 4 3.440
27 4 3.440
28 5 1.513
29 5 2.140
30 5 2.770
31 5 3.170
32 5 3.570
现在我想选择每个齿轮组中的前 20%.
Now I would like to select top 20 % within each gear group.
如果该解决方案可以与 dplyr 的 group_by
功能集成,那就太好了.
It would be very nice if the solution could be integrated with dplyr's group_by
function.
推荐答案
或者 dplyr 的另一个选项:
Or another option with dplyr:
mtcars %>% select(gear, wt) %>%
group_by(gear) %>%
arrange(gear, desc(wt)) %>%
filter(wt > quantile(wt, .8))
Source: local data frame [7 x 2]
Groups: gear [3]
gear wt
(dbl) (dbl)
1 3 5.424
2 3 5.345
3 3 5.250
4 4 3.440
5 4 3.440
6 4 3.190
7 5 3.570
这篇关于dplyr - 分组并选择 TOP x %的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!