问题描述
我有一个很大的数据集,其中包含很多 NA
和一些非Na值。
目前,我为每个列计算非 NA
值,如下所示:
I have a big dataset that contains a lot of NA
s and some non-Na values.At the moment I count my non-NA
values for each column like this:
attach(df)
1000 - (sum(is.na(X1)))
1000 - (sum(is.na(X2)))
1000 - (sum(is.na(X3)))
1000 - (sum(is.na(X4)))
1000 - (sum(is.na(X5)))
...
detach(df)
所以我的观测总长度-总和我的 NA
值。
So my overall length of my observations - the sum of my NA
values.
有没有一种更快的方法,它使用更少的代码行和打字工作,并且给我带来快非 NA
值的所有列和数量的总览?
Is there a faster way which uses less code lines and typing effort and gives me fast overview of all the columns and numbers of non-NA
values?
像for循环之类的东西吗?
Like a for loop or something?
我正在寻找这样的东西:
I am looking for something like this:
X1 Amount of Non-Na-Values
X2 ...
X3 ...
X4
X5
X6
谢谢:)
推荐答案
您也可以致电<$ c $整个数据帧上的c> is.na (隐式强制为al逻辑矩阵),然后对倒置响应调用 colSums
:
You can also call is.na
on the entire data frame (implicitly coercing to a logical matrix) and call colSums
on the inverted response:
# make sample data
set.seed(47)
df <- as.data.frame(matrix(sample(c(0:1, NA), 100*5, TRUE), 100))
str(df)
#> 'data.frame': 100 obs. of 5 variables:
#> $ V1: int NA 1 NA NA 1 NA 1 1 1 NA ...
#> $ V2: int NA NA NA 1 NA 1 0 1 0 NA ...
#> $ V3: int 1 1 0 1 1 NA NA 1 NA NA ...
#> $ V4: int NA 0 NA 0 0 NA 1 1 NA NA ...
#> $ V5: int NA NA NA 0 0 0 0 0 NA NA ...
colSums(!is.na(df))
#> V1 V2 V3 V4 V5
#> 69 55 62 60 70
这篇关于计算数据框中每一列的非NA值的数量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!