问题描述
如何根据单个行的条件选择组,例如保留包含至少一个(ANY)特定值的所有组,例如4,(或任何其他TRUE
至少一次的条件).或者反过来说:如果一个组没有有任何条件为真的行,则应该删除整个组.
How to select groups based on a condition on the individual rows, say keep all groups that contain at least one (ANY) of a certain value, e.g. 4, (or any other condition that is TRUE
at least once). Or phrased the other way around: if a group does not have any rows where condition is true, the entire group should be removed.
我们取一个非常简单的数据,有两个组,我想选择至少有一行 Value
为 4 的组,(即组 B 这里)
Let's take a very simple data, with two groups, and I want to select the group that has at least one row with a Value
of 4, (i.e. group B here)
library(dplyr)
df <- data.frame(Group = LETTERS[c(1,1,1,2,2,2)], Value=c(1:5, 4))
df
# Group Value
# 1 A 1 # Group A has no values == 4 ~~> remove entire group
# 2 A 2
# 3 B 3
# 4 B 4 # Group B has at least one 4 ~~> keep the whole group
先执行 group_by()
,然后执行 filter
(如 这篇文章) 将只选择包含值 4 的个别行,而不是整个组:
Doing group_by()
and then filter
(as in this post) will only select individual rows that contains a value of 4, not the whole group:
df %>%
group_by(Group) %>%
filter(Value == 4)
# Group Value
# <fctr> <int>
# 1 B 4
推荐答案
结果证明这很简单:你只需要使用 filter
中的 any()
函数代码> 调用.确实,看起来:
This turns out to be pretty easy: you just need to use the any()
function in the filter
call. Indeed, it appears that:
filter(any(...))
在group_by()
级别进行评估,
filter(any(...))
evaluates at thegroup_by()
level,
filter(...)
在 rowwise()
级别进行评估,即使前面是 group_by()
.
filter(...)
evaluates at the rowwise()
level, even when preceded by group_by()
.
因此使用:
df %>%
group_by(Group) %>%
filter(any(Value==4))
Group Value
<fctr> <int>
1 B 3
2 B 4
有趣的是,同样出现在 mutate,比较:
Interestingly, the same appear with mutate, compare:
df %>%
group_by(Group) %>%
mutate(check1=any(Value==4),
check2=Value==4)
Group Value check1 check2
<fctr> <int> <lgl> <lgl>
1 A 1 FALSE FALSE
2 A 2 FALSE FALSE
3 B 3 TRUE FALSE
4 B 4 TRUE TRUE
这篇关于选择至少具有特定值之一的组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!