问题描述
我正在尝试使许多 lm
模型在一个函数中工作,我需要从data.table中自动删除常量列。因此,我只想保留具有两个或多个唯一值的列,但从计数中排除 NA
。
I am trying to get many lm
models work in a function and I need to automatically drop constant columns from my data.table. Thus, I want to keep only columns with two or more unique values, excluding NA
from the count.
I尝试了在SO上找到的几种方法,但是我仍然无法删除具有两个值的列:常数和NA。
I tried several methods found on SO, but I am still not able to drop columns that have two values: a constant and NAs.
我的可复制代码:
library(data.table)
df <- data.table(x=c(1,2,3,NA,5), y=c(1,1,NA,NA,NA),z=c(NA,NA,NA,NA,NA),
d=c(2,2,2,2,2))
> df
x y z d
1: 1 1 NA 2
2: 2 1 NA 2
3: 3 NA NA 2
4: NA NA NA 2
5: 5 NA NA 2
我的意图是删除列y,z和d,因为它们是恒定的,包括y,当省略 NA
s时只有一个唯一值。
My intention is to drop columns y, z, and d since they are constant, including y that only have one unique value when NA
s are omitted.
我尝试过:
same <- sapply(df, function(.col){ all(is.na(.col)) || all(.col[1L] == .col)})
df1 <- df[ , !same, with = FALSE]
> df1
x y
1: 1 1
2: 2 1
3: 3 NA
4: NA NA
5: 5 NA
如图所示, y仍然存在...
有帮助吗?
As seen, 'y' is still there ...Any help?
推荐答案
由于您有 data.table
,因此可以使用 uniqueN
及其 na.rm
参数:
Because you have a data.table
, you may use uniqueN
and its na.rm
argument:
df[ , lapply(.SD, function(v) if(uniqueN(v, na.rm = TRUE) > 1) v)]
# x
# 1: 1
# 2: 2
# 3: 3
# 4: NA
# 5: 5
一个 base
替代可能是 Filter(function(x)length(unique (x [!is.na(x)]))> 1,df)
这篇关于删除带有或不带有NA的常量列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!