r - R 中的 k 均值返回值

我在 R 中使用 kmeans() 函数，我很好奇返回对象的 totss 和 tot.withinss 属性之间有什么区别。从文档中，他们似乎返回了相同的东西，但应用于我的数据集，totss 的值为 66213.63，tot.withinss 的值为 6893.50。
如果您熟悉 mroe 的详细信息，请告诉我。
谢谢!

马吕斯。

最佳答案

鉴于平方和之间的平方和 betweenss 和每个集群的平方和内的向量 withinss，公式如下:

totss = tot.withinss + betweenss
tot.withinss = sum(withinss)

例如，如果只有一个簇，那么 betweenss 将是 0 ，在 withinss 和 totss = tot.withinss = withinss 中将只有一个组件。

为了进一步澄清，我们可以根据集群分配自己计算这些不同的数量，这可能有助于澄清它们的含义。考虑数据 x 和 cl$cluster 示例中的集群分配 help(kmeans) 。如下定义平方和函数——从该列中减去 x 的每一列的平均值，然后对剩余矩阵的每个元素的平方和求和:

# or ss <- function(x) sum(apply(x, 2, function(x) x - mean(x))^2)
ss <- function(x) sum(scale(x, scale = FALSE)^2)

然后我们有以下内容。请注意，cl$centers[cl$cluster, ] 是拟合值，即它是一个矩阵，每个点一行，这样第 i 行是第 i 个点所属的集群的中心。

example(kmeans) # create x and cl

betweenss <- ss(cl$centers[cl$cluster,]) # or ss(fitted(cl))

withinss <- sapply(split(as.data.frame(x), cl$cluster), ss)
tot.withinss <- sum(withinss) # or  resid <- x - fitted(cl); ss(resid)

totss <- ss(x) # or tot.withinss + betweenss

cat("totss:", totss, "tot.withinss:", tot.withinss,
  "betweenss:", betweenss, "\n")

# compare above to:

str(cl)

编辑:

自从这个问题得到回答后，R 已经添加了额外的类似 kmeans 示例( example(kmeans) )和一个新的 fitted.kmeans 方法，我们现在在代码行后面的注释中展示了拟合方法如何适应上述内容。

关于r - R 中的 k 均值返回值，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/8637460/