本文介绍了python中的k均值:确定与每个质心关联的数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我一直在使用scipy.cluster.vq.kmeans
进行k均值聚类,但是想知道是否有一种方法可以确定每个数据点(实际上是与之相关)的质心.
I've been using scipy.cluster.vq.kmeans
for doing some k-means clustering, but was wondering if there's a way to determine which centroid each of your data points is (putativly) associated with.
很明显,您可以手动执行此操作,但是据我所知kmeans函数不会返回此值?
Clearly you could do this manually, but as far as I can tell the kmeans function doesn't return this?
推荐答案
scipy.cluster.vq
中还有一个函数kmeans2
,它也返回标签.
There is a function kmeans2
in scipy.cluster.vq
that returns the labels, too.
In [8]: X = scipy.randn(100, 2)
In [9]: centroids, labels = kmeans2(X, 3)
In [10]: labels
Out[10]:
array([2, 1, 2, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 2, 2, 1, 2, 1, 2, 1, 2, 0,
1, 0, 2, 0, 1, 2, 0, 1, 0, 1, 1, 2, 2, 2, 2, 1, 2, 1, 1, 1, 2, 0, 0,
2, 2, 0, 1, 0, 0, 0, 2, 2, 2, 0, 0, 1, 2, 1, 0, 0, 0, 2, 1, 1, 1, 1,
1, 0, 0, 1, 0, 1, 2, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 2, 0, 2, 2, 0,
1, 1, 0, 1, 0, 0, 0, 2])
否则,如果必须使用kmeans
,则还可以使用vq
来获取标签:
Otherwise, if you must use kmeans
, you can also use vq
to get labels:
In [17]: from scipy.cluster.vq import kmeans, vq
In [18]: codebook, distortion = kmeans(X, 3)
In [21]: code, dist = vq(X, codebook)
In [22]: code
Out[22]:
array([1, 0, 1, 0, 2, 2, 2, 0, 1, 1, 0, 2, 1, 0, 1, 1, 0, 1, 0, 1, 0, 1, 1,
2, 2, 1, 2, 0, 1, 1, 0, 2, 2, 0, 1, 0, 1, 0, 2, 1, 2, 0, 2, 1, 1, 1,
0, 1, 2, 0, 1, 2, 2, 1, 1, 1, 2, 2, 0, 0, 2, 2, 2, 2, 1, 0, 2, 2, 2,
0, 1, 1, 2, 1, 0, 0, 0, 0, 1, 2, 1, 2, 0, 2, 0, 2, 2, 1, 1, 1, 1, 1,
2, 0, 2, 0, 2, 1, 1, 1])
这篇关于python中的k均值:确定与每个质心关联的数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!