

如何根据各行上的条件选择组,例如,使包含至少一个(任意)的特定值的所有组保持不变,例如4(或任何其他至少满足 TRUE 的条件).或反过来说:如果组 not 中没有任何行,且条件为true,则应删除整个组.

How to select groups based on a condition on the individual rows, say keep all groups that contain at least one (ANY) of a certain value, e.g. 4, (or any other condition that is TRUE at least once). Or phrased the other way around: if a group does not have any rows where condition is true, the entire group should be removed.

让我们采用一个非常简单的数据,分为两个组,我想选择具有至少一行且 Value 为4的组(即组 B 此处)

Let's take a very simple data, with two groups, and I want to select the group that has at least one row with a Value of 4, (i.e. group B here)

df <- data.frame(Group = LETTERS[c(1,1,1,2,2,2)], Value=c(1:5, 4))

#   Group Value
# 1     A     1 # Group A has no values == 4 ~~> remove entire group
# 2     A     2
# 3     B     3
# 4     B     4 # Group B has at least one 4 ~~> keep the whole group

先执行 group_by(),然后执行 filter (如)只会选择包含值4的单个行,而不是整个组:

Doing group_by() and then filter (as in this post) will only select individual rows that contains a value of 4, not the whole group:

df %>%
  group_by(Group) %>%
  filter(Value == 4)
#    Group Value
#   <fctr> <int>
# 1      B     4


事实证明这很容易:您只需要在 filter 中使用 any()函数调用.确实,看来:

This turns out to be pretty easy: you just need to use the any() function in the filter call. Indeed, it appears that:

filter(...)也会在 rowwise()级别上求值.


 df %>%
    group_by(Group) %>%

Group Value
 <fctr> <int>
1      B     3
2      B     4


Interestingly, the same appear with mutate, compare:

df %>%
group_by(Group) %>%

   Group Value check1 check2
  <fctr> <int>  <lgl>  <lgl>
1      A     1  FALSE  FALSE
2      A     2  FALSE  FALSE
3      B     3   TRUE  FALSE
4      B     4   TRUE   TRUE

