本文介绍了用R中的NA值聚类的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我很惊讶地发现库(群集)中的 clara 允许使用NA。但是函数文档没有说明如何处理这些值。

I was surprised to find out that clara from library(cluster) allows NAs. But function documentation says nothing about how it handles these values.

所以我的问题是:


  1. clara 如何处理NA?

  2. 这可以用于 kmeans (不允许Nas)吗?

  1. How clara handles NAs?
  2. Can this be somehow used for kmeans (Nas not allowed)?

[更新] ,所以我确实在 clara 函数:

[Update] So I did found lines of code in clara function:

inax <- is.na(x)
valmisdat <- 1.1 * max(abs(range(x, na.rm = TRUE)))
x[inax] <- valmisdat

会丢失 valmisdat 的值替换。不知道我理解使用这种公式的原因。有任何想法吗?

which do missing value replacement by valmisdat. Not sure I understand the reason to use such formula. Any ideas? Would it be more "natural" to treat NAs by each column separately, maybe replacing with mean/median?

推荐答案

虽然没有特别说明,但将每列分别处理NA是否更自然,也许用均值/中位数代替?明确地,我相信 NA 是按照?雏菊帮助页面中所述的方式处理的。 详细信息部分具有:

Although not stated explicitly, I believe that NA are handled in the manner described in the ?daisy help page. The Details section has:

在内部,相同的代码将由 clara()这就是我的理解,可以处理数据中的 NA -它们只是不参与计算。在这种情况下,这是一种合理的标准处理方式,例如用于定义Gower的广义相似系数。

Given internally the same code will be being used by clara() that is how I understand that NAs in the data can be handled - they just don't take part in the computation. This is a reasonably standard way of proceeding in such cases and is for example used in the definition of Gower's generalised similarity coefficient.

更新 clara.c 的code> C 来源清楚地表明,这(上述)是 NA的方式 clara()处理(中的第350-356行。/src/clara.c):

Update The C sources for clara.c clearly indicate that this (the above) is how NAs are handled by clara() (lines 350-356 in ./src/clara.c):

    if (has_NA && jtmd[j] < 0) { /* x[,j] has some Missing (NA) */
        /* in the following line (Fortran!), x[-2] ==> seg.fault
           {BDR to R-core, Sat, 3 Aug 2002} */
        if (x[lj] == valmd[j] || x[kj] == valmd[j]) {
        continue /* next j */;
        }
    }

这篇关于用R中的NA值聚类的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-14 14:33