问题描述
我在聚类分析(分层聚类)中有关于组的问题。例如,这是 Iris数据集的完全链接的树状图。
I have problem about group in cluster analysis(hierarchical cluster). As example, this is the dendrogram of complete linkage of Iris data set.
使用后
> table(cutree(hc, 3), iris$Species)
这是输出:
setosa versicolor virginica
1 50 0 0
2 0 23 49
3 0 27 1
我在一个统计网站上读到,数据中的对象1始终属于到组/集群1.从上面的输出中,我们知道 setosa 在组1 中。然后,我将如何了解其他两个物种。他们如何属于第2组或第3组。它是如何发生的。也许我需要知道一个计算方法?
I have read in one statistical website that, object 1 in the data always belongs to group/cluster 1. From the output above, we know that setosa is in group 1. Then, how I am going to know about the other two species. How do they fall into either group 2 or 3. How did it happen. Perhaps there is a calculation I need to know?
推荐答案
我猜您正在使用它来创建该图像
I'm guessing that you're using this to create that image that doesn't appear to be there at the moment.
> lmbjck <- cutree(hclust(dist(iris[1:4], "euclidean")), 3)
> table(lmbjck, iris$Species)
lmbjck setosa versicolor virginica
1 50 0 0
2 0 23 49
3 0 27 1
Dist是通过对来自三个不同物种的具有相同列和行名称的植物进行测量而创建的。
Dist is created from measurements of plants from three different species with identical column and row names.
> iris.dist <- dist(iris[1:4], "euclidean")
> identical(rownames(iris.dist), colnames(iris.dist))
[1] TRUE
该对象传递给hclust,后者构造一棵树并将其切成三段。对象 iris.order
保存树状图的绘制顺序。
That object is passed on to hclust which constructs a tree and cut it into three pieces. Object iris.order
holds the order by which the dendrogram is drawn. Original order is preserved, the tree is drawn based on this ordering.
> iris.hclust <- hclust(iris.dist)
> iris.cutree <- cutree(iris.hclust, 3)
> iris.order <- iris.hclust$order
这里有证据。我把原始的种
命名,有序的物种命名放在一起,从树皮图,订单号和从零碎功能中可以看到它们。
Here's proof. I've put together original Species
designations, ordered species designations as they can be seen in the dendrogram, order number and group from a cutree function.
> data.frame(original = iris$Species, ordered = iris$Species[iris.order],
order.num = iris.order, cutree = iris.cutree)
original ordered order.num cutree
1 setosa virginica 108 1
2 setosa virginica 131 1
3 setosa virginica 103 1
4 setosa virginica 126 1
5 setosa virginica 130 1
6 setosa virginica 119 1
...
103 virginica setosa 31 2
104 virginica setosa 26 2
105 virginica setosa 10 2
106 virginica setosa 35 2
107 virginica setosa 13 3
108 virginica setosa 2 2
...
让我们看一下输出。如果您看第一行,在 order.num
下,数字为108。这意味着该项(树状图左侧的第一项)来自第108行跳到第108行,您可以看到原始的 Species
确实是 virginica
。 Cutree将此分配给组 1
。让我们看一下第3行。在 order.num
下,您可以看到此项目来自第103行。同样,如果您向下查看第103行中的原始物种,则为(仍然)弗吉尼亚州
。我将练习让您检查其他(随机)行,并说服自己保留了开始时构造表的顺序。因此,表格应该正确。
Let's look at the output. If you look at the first line, under order.num
there's number 108. This means that for this item (first item on the left side of the dendrogram) comes from row 108. Skim down to line 108, and you can see that the original Species
is indeed virginica
. Cutree assigns this to group 1
. Let's look at line 3. Under order.num
you can see that this item comes from row 103. Again, if you go down and check the original species in row 103, it's (still) virginica
. I'll make it an exercise for you to check other (random) rows and convince yourself that the order for constructing the table at the beginning is preserved. Ergo, the table should thus be correct.
这篇关于如何在聚类分析(分层)中了解组信息?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!