NA值和R聚合函数

本文介绍了NA值和R聚合函数的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

这是一个缺少值的简单数据框：

Here's a simple data frame with a missing value:

M = data.frame（Name = c（'name'，'name '），Col1 = c（NA，1），Col2 = c（1,1））

当我对M应用聚合时方式：

When I apply aggregate to M this way:

aggregate（。〜Name，M，FUN = sum，na.rm = TRUE）

结果是：

RowName Col1 Col2
name    1    1

因此，整个第一行都将被忽略。但是如果我这样做

So the entire first row is ignored. But if I do

aggregate（M [，2：3]，by = list（M $ Name），FUN = sum，na。 rm = TRUE）

结果为

Group.1 Col1 Col2
name    1    2

所以只有（1， 1）输入被忽略。

So only the (1,1) entry is ignored.

这在我的一个代码中引起了严重的调试麻烦，因为我认为这两个调用是等效的。是否有充分的理由对公式输入方法进行不同的处理？

This caused a major debugging headache in one of my codes, since I thought these two calls were equivalent. Is there a good reason why the "formula" entry method is treated differently?

谢谢。

推荐答案

很好的问题，但我认为，这不应该引起 major 调试的麻烦，因为在 aggregate 的手册页中的很多地方都清楚地记录了该错误。

Good question, but in my opinion, this shouldn't have caused a major debugging headache because it is documented quite clearly in multiple places in the manual page for aggregate.

首先，在用法部分：

## S3 method for class 'formula'
aggregate(formula, data, FUN, ...,
          subset, na.action = na.omit)

稍后，在说明中：

我不能回答为什么公式模式的写法不同-这是函数作者必须回答的问题--但是使用以上信息，您可能可以使用以下内容：

I can't answer why the formula mode was written differently---that's something the function authors would have to answer---but using the above information, you can probably use the following:

aggregate(.~Name, M, FUN=sum, na.rm=TRUE, na.action=NULL)
#   Name Col1 Col2
# 1 name    1    2

这篇关于NA值和R聚合函数的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！