问题描述
这是一个缺少值的简单数据框:
Here's a simple data frame with a missing value:
M = data.frame(Name = c('name','name '),Col1 = c(NA,1),Col2 = c(1,1))
当我对M应用聚合时方式:
When I apply aggregate to M this way:
aggregate(。〜Name,M,FUN = sum,na.rm = TRUE)
结果是:
RowName Col1 Col2
name 1 1
因此,整个第一行都将被忽略。但是如果我这样做
So the entire first row is ignored. But if I do
aggregate(M [,2:3],by = list(M $ Name),FUN = sum,na。 rm = TRUE)
结果为
Group.1 Col1 Col2
name 1 2
所以只有(1, 1)输入被忽略。
So only the (1,1) entry is ignored.
这在我的一个代码中引起了严重的调试麻烦,因为我认为这两个调用是等效的。是否有充分的理由对公式输入方法进行不同的处理?
This caused a major debugging headache in one of my codes, since I thought these two calls were equivalent. Is there a good reason why the "formula" entry method is treated differently?
谢谢。
推荐答案
很好的问题,但我认为,这不应该引起 major 调试的麻烦,因为在 aggregate
的手册页中的很多地方都清楚地记录了该错误。
Good question, but in my opinion, this shouldn't have caused a major debugging headache because it is documented quite clearly in multiple places in the manual page for aggregate
.
首先,在用法部分:
## S3 method for class 'formula'
aggregate(formula, data, FUN, ...,
subset, na.action = na.omit)
稍后,在说明中:
我不能回答为什么公式模式的写法不同-这是函数作者必须回答的问题--但是使用以上信息,您可能可以使用以下内容:
I can't answer why the formula mode was written differently---that's something the function authors would have to answer---but using the above information, you can probably use the following:
aggregate(.~Name, M, FUN=sum, na.rm=TRUE, na.action=NULL)
# Name Col1 Col2
# 1 name 1 2
这篇关于NA值和R聚合函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!