问题描述
我想对具有三个变量(列)的数据集(即Sample_Data)进行Kmeans聚类,如下所示:
I want to do a Kmeans clustering on a dataset (namely, Sample_Data) with three variables (columns) such as below:
A B C
1 12 10 1
2 8 11 2
3 14 10 1
. . . .
. . . .
. . . .
通常,在缩放列数并确定簇数之后,我将在R中使用此函数:
in a typical way, after scaling the columns, and determining the number of clusters, I will use this function in R:
Sample_Data <- scale(Sample_Data)
output_kmeans <- kmeans(Sample_Data, centers = 5, nstart = 50)
但是,如果对变量有偏爱怎么办?我的意思是,假设变量(列)A比其他两个变量更重要?如何在模型中插入它们的权重?谢谢大家
But, what if there is a preference for the variables? I mean that, suppose variable (column) A, is more important than the two other variables?how can I insert their weights in the model?Thank you all
推荐答案
您必须使用kmeans加权聚类,如flexclust
软件包中提供的那样:
You have to use a kmeans weighted clustering, like the one presented in flexclust
package:
https://cran.r-project.org/web /packages/flexclust/flexclust.pdf
功能
cclust(x, k, dist = "euclidean", method = "kmeans",
weights=NULL, control=NULL, group=NULL, simple=FALSE,
save.data=FALSE)
在数据矩阵上进行k均值聚类,艰苦的竞争性学习或神经毒气.weights
拟合过程中要使用的可选权重向量.仅与艰苦的竞争性学习结合使用.
Perform k-means clustering, hard competitive learning or neural gas on a data matrix.weights
An optional vector of weights to be used in the fitting process. Works only in combination with hard competitive learning.
使用虹膜数据的玩具示例:
A toy example using iris data:
library(flexclust)
data(iris)
cl <- cclust(iris[,-5], k=3, save.data=TRUE,weights =c(1,0.5,1,0.1),method="hardcl")
cl
kcca object of family ‘kmeans’
call:
cclust(x = iris[, -5], k = 3, method = "hardcl", weights = c(1, 0.5, 1, 0.1), save.data = TRUE)
cluster sizes:
1 2 3
50 59 41
从cclust的输出中可以看到,使用竞争性学习,家庭永远是千里眼.差异与训练阶段的群集分配有关:
As you can see from the output of cclust, also using competitive learning the family is always kmenas. The difference is related to cluster assignment during training phase:
weights
参数只是一个数字序列,通常我使用介于0.01(最小权重)和1(最大权重)之间的数字.
The weights
parameter is just a sequence of numbers, in general I use number between 0.01 (minimum weight) and 1 (maximum weight).
这篇关于加权Kmeans R的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!