问题描述
我有一个完全由布尔变量组成的数据集.完全像下面的转换后的动物数据集一样,只有更多的列.
#http://stats.stackexchange.com/questions/27323/cluster-analysis-of-boolean-vectors-in-r图书馆(集群)头(mona(动物)[[1]])战果蚂蚁0 0 0 0 1 0蜜蜂0 1 0 0 1 1猫1 0 1 0 0 1cpl 0 0 0 0 0 1气1 0 1 1 1 1牛1 0 1 0 1 1
目标是重新排列行,以使类似成员资格模式的分组更易于从视觉上识别.
我认为可能会采用某种聚类算法,但是我不确定要使用什么函数或如何精确地实现它.
理想情况下,该表将被绘制为一种棋盘格.带有阴影的正方形表示每个点是对还是错.
此解决方案使用分层聚类对变量进行重新排序.值得注意的是,由于相异度矩阵变大,这在大量观察中无法很好地扩展.在
I have a dataset which consists entirely of boolean variables. Exactly like the transformed animal dataset below, only with many more columns.
# http://stats.stackexchange.com/questions/27323/cluster-analysis-of-boolean-vectors-in-r
library(cluster)
head(mona(animals)[[1]])
war fly ver end gro hai
ant 0 0 0 0 1 0
bee 0 1 0 0 1 1
cat 1 0 1 0 0 1
cpl 0 0 0 0 0 1
chi 1 0 1 1 1 1
cow 1 0 1 0 1 1
The goal is to rearrange the rows in such a way that groupings of similar membership patterns are easier to identify visually.
I figured some kind of clustering algorithm would probably be the way to go but I'm not sure what functions to use or how to go about it exactly.
The table would ideally be graphed as a kind of checkerboard. With shaded squares for whether each point is true or false.
This solution uses hierarchical clustering to reorder the variables. It's worth noting this doesn't scale well with large amounts of observations due to dissimilarity matrices getting to big. An alternative algorithm for many observations was suggested in this answer but I didn't fully understand it or see how to implement it based on the chapter referenced.
library(cluster)
library(reshape2)
library(ggplot2)
# testing that it works using the categorical animals dataset
adData <- mona(animals)$data
# import the data, encoded with 0s and 1s for membership
# adData <- read.csv('adData.csv')
# clustering based off this answer https://stats.stackexchange.com/a/48364
# create a dissimilarity matrix
disimilarAdData <- daisy(adData)
# hierarchically cluster by dissimilarity
clusteredAdData <- agnes(disimilarAdData)
# reorder the rows by dissimilarity
orderedAdData <- adData[clusteredAdData[[1]], ]
# make it logical data type for better graphing
plotData <- sapply(as.data.frame(orderedAdData), as.logical)
row.names(plotData) <- row.names(orderedAdData)
# plot graph using shaded rows
# http://stackoverflow.com/questions/21316363/plot-and-fill-chessboard-like-area-and-the-similars-in-r
ggplot(melt(plotData), aes(x=Var2, y=Var1, fill=value)) + geom_tile()
这篇关于如何在R中制作聚集的布尔变量图?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!