问题描述
我很惊讶地发现库(群集)
中的 clara
允许使用NA。但是函数文档没有说明如何处理这些值。
I was surprised to find out that clara
from library(cluster)
allows NAs. But function documentation says nothing about how it handles these values.
所以我的问题是:
-
clara
如何处理NA? - 这可以用于
kmeans
(不允许Nas)吗?
- How
clara
handles NAs? - Can this be somehow used for
kmeans
(Nas not allowed)?
[更新] ,所以我确实在 clara
函数:
[Update] So I did found lines of code in clara
function:
inax <- is.na(x)
valmisdat <- 1.1 * max(abs(range(x, na.rm = TRUE)))
x[inax] <- valmisdat
会丢失 valmisdat
的值替换。不知道我理解使用这种公式的原因。有任何想法吗?
which do missing value replacement by valmisdat
. Not sure I understand the reason to use such formula. Any ideas? Would it be more "natural" to treat NAs by each column separately, maybe replacing with mean/median?
推荐答案
虽然没有特别说明,但将每列分别处理NA是否更自然,也许用均值/中位数代替?明确地,我相信 NA
是按照?雏菊
帮助页面中所述的方式处理的。 详细信息部分具有:
Although not stated explicitly, I believe that NA
are handled in the manner described in the ?daisy
help page. The Details section has:
在内部,相同的代码将由 clara()$ c $使用c>这就是我的理解,可以处理数据中的
NA
-它们只是不参与计算。在这种情况下,这是一种合理的标准处理方式,例如用于定义Gower的广义相似系数。
Given internally the same code will be being used by clara()
that is how I understand that NA
s in the data can be handled - they just don't take part in the computation. This is a reasonably standard way of proceeding in such cases and is for example used in the definition of Gower's generalised similarity coefficient.
更新 clara.c
的code> C 来源清楚地表明,这(上述)是 NA的方式
由 clara()
处理(中的第350-356行。/src/clara.c
):
Update The C
sources for clara.c
clearly indicate that this (the above) is how NA
s are handled by clara()
(lines 350-356 in ./src/clara.c
):
if (has_NA && jtmd[j] < 0) { /* x[,j] has some Missing (NA) */
/* in the following line (Fortran!), x[-2] ==> seg.fault
{BDR to R-core, Sat, 3 Aug 2002} */
if (x[lj] == valmd[j] || x[kj] == valmd[j]) {
continue /* next j */;
}
}
这篇关于用R中的NA值聚类的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!