问题描述
我想对数据框进行子集化,以仅保留在不同日期具有3个或更多观察值的组.我想摆脱观察少于3个的组,或者他们不是来自3天不同的观察组.
I would like to subset my data frame to keep only groups that have 3 or more observations on DIFFERENT days. I want to get rid of groups that have less than 3 observations, or the observations they have are not from 3 different days.
这是一个示例数据集:
Group Day
1 1
1 3
1 5
1 5
2 2
2 2
2 4
2 4
3 1
3 2
3 3
4 1
4 5
因此对于上面的示例,将保留组1和组3,并从数据帧中删除组2和4.
So for the above example, group 1 and group 3 will be kept and group 2 and 4 will be removed from the data frame.
我希望这是有道理的,我想解决方案将非常简单,但我无法解决(我对R还是很陌生,并且对此类问题的解决方案不太快).我认为diff函数可能会派上用场,但并没有进一步发展.
I hope this makes sense, I imagine the solution will be quite simple but I can't work it out (I'm quite new to R and not very fast at coming up with solutions to things like this). I thought maybe the diff function could come in handy but didn't get much further.
推荐答案
带有 data.table 您可以做到:
library(data.table)
DT[, if(uniqueN(Day) >= 3) .SD, by = Group]
给出:
Group Day
1: 1 1
2: 1 3
3: 1 5
4: 1 5
5: 3 1
6: 3 2
7: 3 3
或使用dplyr
:
library(dplyr)
DT %>%
group_by(Group) %>%
filter(n_distinct(Day) >= 3)
给出相同的结果.
这篇关于删除少于三个唯一观察值的组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!