本文介绍了按组计算总缺失值?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
输入
对此非常陌生.
我对此有类似的问题:分组依据,然后计算缺失的变量?
I have a similar problem to this:group by and then count missing variables?
从该问题中获取输入数据:
Taking the input data from that question:
df1 <- data.frame(
Z = sample(LETTERS[1:5], size = 10000, replace = T),
X1 = sample(c(1:10,NA), 10000, replace = T),
X2 = sample(c(1:25,NA), 10000, replace = T),
X3 = sample(c(1:5,NA), 10000, replace = T))
根据一个用户的建议,可以使用 summarise_each
:
as one user proposed, it's possible to use summarise_each
:
df1 %>%
group_by(Z) %>%
summarise_each(funs(sum(is.na(.))))
#Source: local data frame [5 x 4]
#
# Z X1 X2 X3
# (fctr) (int) (int) (int)
#1 A 169 77 334
#2 B 170 77 316
#3 C 159 78 348
#4 D 181 79 326
#5 E 174 69 341
但是,我只想获取每组缺失值的总数.
However, I would like to get only the total number of missing values per group.
我也尝试过这种方法,但是没有用: R按组对NA进行计数
I've also tried this but it didn't work: R count NA by group
理想情况下,它应该给我类似的东西
Ideally, it should give me something like:
# Z sumNA
# (fctr) (int)
#1 A 580
#2 B 493
#3 C 585
#4 D 586
#5 E 584
谢谢.
推荐答案
data.table
解决方案
library(data.table)
setDT(df1)
df1[, .(sumNA = sum(is.na(.SD))), by = Z]
# Z sumNA
# 1: A 559
# 2: C 661
# 3: E 596
# 4: B 597
# 5: D 560
使用 rowSums(.[-1])
的
dplyr
解决方案,即除第一列外的所有列的行总和.
dplyr
solution using rowSums(.[-1])
, i.e. row-sums for all columns except the first.
library(dplyr)
df1 %>%
group_by(Z) %>%
summarise_all(~sum(is.na(.))) %>%
transmute(Z, sumNA = rowSums(.[-1]))
# # A tibble: 5 x 2
# Z sumNA
# <fct> <dbl>
# 1 A 559
# 2 B 597
# 3 C 661
# 4 D 560
# 5 E 596
这篇关于按组计算总缺失值?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!