本文介绍了加权Kmeans R的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想对具有三个变量(列)的数据集(即Sample_Data)进行Kmeans聚类,如下所示:

I want to do a Kmeans clustering on a dataset (namely, Sample_Data) with three variables (columns) such as below:

     A  B  C
1    12 10 1
2    8  11 2
3    14 10 1
.    .   .  .
.    .   .  .
.    .   .  .

通常,在缩放列数并确定簇数之后,我将在R中使用此函数:

in a typical way, after scaling the columns, and determining the number of clusters, I will use this function in R:

Sample_Data <- scale(Sample_Data)
output_kmeans <- kmeans(Sample_Data, centers = 5, nstart = 50)

但是,如果对变量有偏爱怎么办?我的意思是,假设变量(列)A比其他两个变量更重要?如何在模型中插入它们的权重?谢谢大家

But, what if there is a preference for the variables? I mean that, suppose variable (column) A, is more important than the two other variables?how can I insert their weights in the model?Thank you all

推荐答案

您必须使用kmeans加权聚类,如flexclust软件包中提供的那样:

You have to use a kmeans weighted clustering, like the one presented in flexclust package:

https://cran.r-project.org/web /packages/flexclust/flexclust.pdf

功能

cclust(x, k, dist = "euclidean", method = "kmeans",
weights=NULL, control=NULL, group=NULL, simple=FALSE,
save.data=FALSE)

在数据矩阵上进行k均值聚类,艰苦的竞争性学习或神经毒气.weights拟合过程中要使用的可选权重向量.仅与艰苦的竞争性学习结合使用.

Perform k-means clustering, hard competitive learning or neural gas on a data matrix.weights An optional vector of weights to be used in the fitting process. Works only in combination with hard competitive learning.

使用虹膜数据的玩具示例:

A toy example using iris data:

library(flexclust)
data(iris)
cl <- cclust(iris[,-5], k=3, save.data=TRUE,weights =c(1,0.5,1,0.1),method="hardcl")
cl  
    kcca object of family ‘kmeans’ 

    call:
    cclust(x = iris[, -5], k = 3, method = "hardcl", weights = c(1, 0.5, 1, 0.1), save.data = TRUE)

    cluster sizes:

     1  2  3 
    50 59 41 

从cclust的输出中可以看到,使用竞争性学习,家庭永远是千里眼.差异与训练阶段的群集分配有关:

As you can see from the output of cclust, also using competitive learning the family is always kmenas. The difference is related to cluster assignment during training phase:

weights参数只是一个数字序列,通常我使用介于0.01(最小权重)和1(最大权重)之间的数字.

The weights parameter is just a sequence of numbers, in general I use number between 0.01 (minimum weight) and 1 (maximum weight).

这篇关于加权Kmeans R的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-16 02:09