使用逻辑向量子集列

本文介绍了使用逻辑向量子集列的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个数据框，我想删除那些NA率> 70％的列，或者占主导地位的值占据了99％的行。我怎么能在R中做到这一点？例如，如果我写：

  isNARateLt70<-function（column）{//某些代码} 
适用（数据帧2，isNARateLt70） / p> 
 
解决方案
当我们有 colMeans （感谢@MrFlick提供的更改 colSums（）/ nrow（）的建议，并显示在此答案的底部）。 
 
 
 如果您以后要使用 sapply ，这就是我要如何使用您的函数。
 > d<-data.frame（x = rep（NA，5），y = c（1，NA，NA，1，1），
z = c（rep（NA，3），1，2） ）
 
> isNARateLt70<-function（x）平均值（is.na（x））< = 0.7 
> sapply（d，isNARateLt70）
＃xyz 
＃否是是是
  
然后，要使用上面的代码行将上面一行的数据作为子集，
 > d [sapply（d，isNARateLt70）] 
  
但是如上所述， colMeans 的工作原理相同，
 > d [colMeans（is.na（d））< = 0.7] 
＃yz 
＃1 1 NA 
＃2 NA NA 
＃3 NA NA 
 ＃4 1 1 
＃5 1 2 
  
 
I have a dataframe that I want to drop those columns with NA's rate > 70% or there is dominant value taking over 99% of rows. How can I do that in R?
I find it easier to select rows with logic vector in subset function, but how can I do the similar for columns? For example, if I write:
isNARateLt70 <- function(column) {//some code}
apply(dataframe, 2, isNARateLt70)
Then how can I continue to use this vector to subset dataframe?
 解决方案 
There's really no need to write a function when we have colMeans (thanks @MrFlick for the advice to change from colSums()/nrow(), and shown at the bottom of this answer).  
Here's how I would approach your function if you want to use sapply on it later.
> d <- data.frame(x = rep(NA, 5), y = c(1, NA, NA, 1, 1),
                  z = c(rep(NA, 3), 1, 2))

> isNARateLt70 <- function(x) mean(is.na(x)) <= 0.7
> sapply(d, isNARateLt70)
#     x     y     z
# FALSE  TRUE  TRUE
Then, to subset with the above line your data using the above line of code, it's
> d[sapply(d, isNARateLt70)]
But as mentioned, colMeans works just the same,
> d[colMeans(is.na(d)) <= 0.7]
#    y  z
# 1  1 NA
# 2 NA NA
# 3 NA NA
# 4  1  1
# 5  1  2
                        
这篇关于使用逻辑向量子集列的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！