本文介绍了Python将k-means集群关联到实例的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经阅读了文档此处,以及查看教程,但我仍然缺少一些有关在以下环境中使用K-means的基本知识scikit学习:

I've read the docshere as well as looking at this tutorial, but I am still missing something fundamental about using K-means in scikit-learn:

说我有一个这样的数据集:

Say I have a dataset as such:

|UserName| Variable1 | Variable2 | Variable3 |  Cluster |
|  bob   |    1      |     3     |    7      |          |
|  joe   |    2      |     4     |    8      |          |
|  bill  |    1      |     6     |    4      |          |

由于K均值采用一个numpy数组,因此我必须去除用户名,而仅使用数字变量.但是,在创建群集之后,如何将它们重新关联到每个用户,以进行进一步的分析.即我如何用相应的群集号填充群集"列?

Since K-means takes a numpy array I have to strip out the username and just use the numerical variables. But, after the clusters have been created how do I relate them back to each individual user for further analysis. I.e how would I fill the "Cluster" column with the corresponding cluster number?

推荐答案

下面是一个示例,假设您将数据从文件中读取到列表中:

Here's an example, assuming you read the data into a list from file:

import sklearn.cluster
import numpy as np

data = [
    ['bob', 1, 3, 7],
    ['joe', 2, 4, 8],
    ['bill', 1, 6, 4],
]

labels = [x[0] for x in data]
a = np.array([x[1:] for x in data])
clust_centers = 2

model = sklearn.cluster.k_means(a, clust_centers)

模型现在包含一个具有(质心,标签,间质)的元组

model now contains a tuple with (centroids, labels, intertia)

所以像这样重新获得标签:

So get the labels back like this:

clusters = dict(zip(lables, model[1]))

并打印"one"的集群ID:

And to print the cluster id for 'one':

print clusters['bob']

或将其发送回csv,如下所示:

Or send it back out to a csv like this:

for d in data:
    print '%s,%d' % (','.join([str(x) for x in d]), clusters[d[0]])

这篇关于Python将k-means集群关联到实例的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-20 03:15