问题描述
我的data.frame有很多NA:
I have this data.frame with a lot of NAs:
df <- data.frame(a = rep(letters[1:3], each = 3),
b = c(NA, NA, NA, 1, NA, 3, NA, NA, 7))
df
> df
a b
1 a NA
2 a NA
3 a NA
4 b 1
5 b NA
6 b 3
7 c NA
8 c NA
9 c 7
我想对该子数据框进行子集化,以获得仅具有不少于两个值的因子组行,例如:
I would like to subset this dataframe to obtain only factor group rows that have no less than two values, such as this:
a b
1 b 1
2 b NA
3 b 3
我尝试过此函数但不起作用:
I have tried this function but it doesn't work:
subset(df, sum(!is.na(b)) < 1, by = a)
> [1] a b
<0 rows> (or 0-length row.names)
有什么建议吗? (欢迎使用其他软件包解决方案)
Any suggestion? (other packages solutions are welcome)
推荐答案
我们可以使用 data.table
。将'data.frame'转换为'data.table'( setDT(df)
),按'a'分组, if
逻辑向量的和
(即非NA元素-!is.na(b)
)为大于1,然后对Data.table进行子集设置。
We can use data.table
. Convert the 'data.frame' to 'data.table' (setDT(df)
), grouped by 'a', if
the sum
of logical vector (i.e. non-NA elements - !is.na(b)
) is greater than 1, then Subset the Data.table.
library(data.table)
setDT(df)[,if(sum(!is.na(b))>1) .SD , by = a]
# a b
#1: b 1
#2: b NA
#3: b 3
或使用 dplyr
,按照相同的逻辑,在按'a'分组后,我们过滤
行。
Or using dplyr
, with the same logic, after grouping by 'a', we filter
the rows.
library(dplyr)
df %>%
group_by(a) %>%
filter(sum(!is.na(b))>1)
# a b
# <fctr> <dbl>
#1 b 1
#2 b NA
#3 b 3
或者在 base R
中使用 ave
df[with(df, ave(b, a, FUN = function(x) sum(!is.na(x))>1)!=0),]
这篇关于r按条件和因子组的子集行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!