问题描述
是否可以使用 dplyr 为完整案例过滤 data.frame?当然,complete.cases
包含所有变量的列表.但这是 a) 当有很多变量时很冗长,b) 当变量名称未知时(例如在处理任何 data.frame 的函数中)是不可能的.
Is it possible to filter a data.frame for complete cases using dplyr? complete.cases
with a list of all variables works, of course. But that is a) verbose when there are a lot of variables and b) impossible when the variable names are not known (e.g. in a function that processes any data.frame).
library(dplyr)
df = data.frame(
x1 = c(1,2,3,NA),
x2 = c(1,2,NA,5)
)
df %.%
filter(complete.cases(x1,x2))
推荐答案
试试这个:
df %>% na.omit
或者这个:
df %>% filter(complete.cases(.))
或者这个:
library(tidyr)
df %>% drop_na
如果您想根据一个变量的缺失进行过滤,请使用条件:
If you want to filter based on one variable's missingness, use a conditional:
df %>% filter(!is.na(x1))
或
df %>% drop_na(x1)
其他答案表明 na.omit
上面的解决方案要慢得多,但这必须与它返回 na.action 中省略行的行索引这一事实相平衡
属性,而上述其他解决方案没有.
Other answers indicate that of the solutions above na.omit
is much slower but that has to be balanced against the fact that it returns row indices of the omitted rows in the na.action
attribute whereas the other solutions above do not.
str(df %>% na.omit)
## 'data.frame': 2 obs. of 2 variables:
## $ x1: num 1 2
## $ x2: num 1 2
## - attr(*, "na.action")= 'omit' Named int 3 4
## ..- attr(*, "names")= chr "3" "4"
添加已更新以反映最新版本的 dplyr 和评论.
ADDED Have updated to reflect latest version of dplyr and comments.
添加已更新以反映最新版本的 tidyr 和评论.
ADDED Have updated to reflect latest version of tidyr and comments.
这篇关于使用 dplyr 过滤 data.frame 中的完整案例(逐个删除)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!