本文介绍了按组计算总缺失值?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

输入

对此非常陌生.

我对此有类似的问题:分组依据,然后计算缺失的变量?

I have a similar problem to this:group by and then count missing variables?

从该问题中获取输入数据:

Taking the input data from that question:

df1 <- data.frame(
  Z = sample(LETTERS[1:5], size = 10000, replace = T),
  X1 = sample(c(1:10,NA), 10000, replace = T),
  X2 = sample(c(1:25,NA), 10000, replace = T),
  X3 = sample(c(1:5,NA), 10000, replace = T))

根据一个用户的建议,可以使用 summarise_each :

as one user proposed, it's possible to use summarise_each:

df1 %>%
  group_by(Z) %>%
  summarise_each(funs(sum(is.na(.))))
#Source: local data frame [5 x 4]
#
#       Z    X1    X2    X3
#  (fctr) (int) (int) (int)
#1      A   169    77   334
#2      B   170    77   316
#3      C   159    78   348
#4      D   181    79   326
#5      E   174    69   341

但是,我只想获取每组缺失值的总数.

However, I would like to get only the total number of missing values per group.

我也尝试过这种方法,但是没有用: R按组对NA进行计数

I've also tried this but it didn't work: R count NA by group

理想情况下,它应该给我类似的东西

Ideally, it should give me something like:

#       Z    sumNA
#  (fctr)   (int)
#1      A    580
#2      B    493
#3      C    585
#4      D    586
#5      E    584

谢谢.

推荐答案

data.table 解决方案

library(data.table)
setDT(df1)

df1[, .(sumNA = sum(is.na(.SD))), by = Z]

#    Z sumNA
# 1: A   559
# 2: C   661
# 3: E   596
# 4: B   597
# 5: D   560

使用 rowSums(.[-1])

dplyr 解决方案,即除第一列外的所有列的行总和.

dplyr solution using rowSums(.[-1]), i.e. row-sums for all columns except the first.

library(dplyr)

df1 %>%
  group_by(Z) %>%
  summarise_all(~sum(is.na(.))) %>%
  transmute(Z, sumNA = rowSums(.[-1]))

# # A tibble: 5 x 2
#   Z     sumNA
#   <fct> <dbl>
# 1 A       559
# 2 B       597
# 3 C       661
# 4 D       560
# 5 E       596

这篇关于按组计算总缺失值?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-14 03:10