本文介绍了R中的层次聚类(单链接)中的测量精度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何在具有2个聚类的R中测量层次聚类(单链接)的准确性?这是我的代码:

How can I measure accuracy in Hierarchical Clustering (Single link) in R with 2 Clusters ?Here is my code:

> dcdata = read.csv("kkk.txt")
> target = dcdata[,3]
> dcdata = dcdata [,1:2]
> d = dist(dcdata)
> hc_single = hclust(d,method="single")
> plot(hc_single)
> clusters =cutree(hc_single, k=2)
> print(clusters)

谢谢!

推荐答案

精度不是最准确的术语,但是我想您想看看分层聚类是否为您提供了与标签重合的聚类或组.例如,我使用虹膜数据集,并使用setosa与其他对象作为目标:

Accuracy is not the most accurate term, but I guess you want to see whether the hierarchical clustering gives you clusters or groups that coincide with your labels. For example, I use the iris dataset, and use setosa vs others as target:

data = iris
target = ifelse(data$Species=="setosa","setosa","others")
table(target)
others setosa
   100     50

data = data[,1:4]
d = dist(data)
hc_single = hclust(d,method="single")
plot(hc_single)

好像它们是两个主要集群.现在,我们尝试查看目标的分布方式:

Seems like they are two major clusters. Now we try to see how the target are distributed:

library(dendextend)
dend <- as.dendrogram(hc_single)
COLS = c("turquoise","orange")
names(COLS) = unique(target)
dend <- color_labels(dend, col = COLS[target[labels(dend)]])
plot(dend)

现在像您所做的一样,我们得到了簇,

Now like what you did, we get the clusters,

clusters =cutree(hc_single, k=2)
table(clusters,target)

            target
    clusters others setosa
           1      0     50
           2    100      0

您获得了近乎完美的分离.群集1中的所有数据点均为setosa,而群集2中的所有数据点均不是setosa.因此,您可以将其视为100%的准确性,但我会谨慎使用该术语.

You get an almost perfect separation. All the data points in cluster 1 are setosa and all in cluster 2 are not setosa. So you can think of it as like 100% accuracy but I would be careful about using the term.

您可以大致计算出这样的巧合:

You can roughly calculate the coincidence like this:

Majority_class = tapply(factor(target),clusters,function(i)names(sort(table(i)))[2])

这将告诉您每个群集,这是多数类.从那里我们可以看到这与实际标签有多大的一致性.

This tells you for each cluster, which is the majority class. And from there we see how much this agrees with the actual labels.

mean(Majority_class[clusters] == target)

这篇关于R中的层次聚类(单链接)中的测量精度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-11 16:17