如何在R中使用igraph包传播社区图

如何在R中使用igraph包传播社区图

本文介绍了如何在R中使用igraph包传播社区图的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

尝试在推特数据中查找社区。不同单词之间的余弦相似度形成邻接矩阵。然后,我根据该邻接矩阵创建了图。图形的可视化是这里的任务:

 #文档术语矩阵
dtm = DocumentTermMatrix(tweets)

###此处调整阈值
dtms = removeSparseTerms(dtm,0.998)
dim(dtms)

#余弦相似度矩阵
t = as。 matrix(dtms)

#比较两个单词特征向量
#cosine(t [, yesterday],t [, yet])

numWords = dim(t)[2]

#矩阵所有列向量之间的余弦量度。
adjMat =余弦(t)

r = 3
for(i in 1:numWords)
{
highElement = sort(adjMat [i,] ,partial = numWords-r)[numWords-r]
adjMat [i,] [adjMat [i,]< highElement] = 0
}

#从邻接矩阵中构建图形
g = graph.adjacency(adjMat,weighted = TRUE,mode = undirected,diag = FALSE)
V(g)$ name

#删除循环和多条边
g =简化(g)
wt = walktrap.community(g,步骤= 5)#默认steps = 2
table(membership(wt))

#设置顶点颜色&大小
nodecolor = rainbow(length(table(membership(wt))))[as.vector(membership(wt))]
nodesize = as.matrix(round((log2(10 * membership( wt)))))
nodelayout = layout.fruchterman.reingold(g,niter = 1000,area = vcount(g)^ 1.1,repulserad = vcount(g)^ 10.0,weights = NULL)

par(mai = c(0,0,1,0))
图(g,
layout = nodelayout,
vertex.size = nodesize,
vertex .label = NA,
vertex.color = nodecolor,
edge.arrow.size = 0.2,
edge.color = grey,
edge.width = 1)

我只想在单独的集群/社区之间留出一些空白。



NetPathMiner 中实现了此功能。仅仅为了可视化功能安装软件包似乎有点困难。我会在这里写一个简单的版本并解释它的作用。

  layout.by.attr<-function( graph,wc,cluster.strength = 1,layout = layout.auto){
g<-graph.edgelist(get.edgelist(graph))#创建没有属性的图的轻量级副本。
E(g)$ weight
attr g
l<-layout(g,weights = E( g)$ weight)[1:vcount(graph),]
return(l)
}

基本上,该函数添加一个额外的顶点,该顶点连接到属于同一社区的所有顶点。根据新图计算布局。由于每个社区现在都通过一个共同的顶点相连,因此它们倾向于聚在一起。



正如Gabor在评论中所说,增加边缘权重也会产生类似的效果。该函数通过增加 cluster.strength 来利用此信息,从而为所创建的顶点及其社区之间的边赋予更高的权重。



如果这还不够,您可以通过在同一社区的所有顶点之间添加边(形成集团)来扩展此原理(在更连通的图上计算布局)。根据我的经验,这有点过分。


Trying to find communities in tweet data. The cosine similarity between different words forms the adjacency matrix. Then, I created graph out of that adjacency matrix. Visualization of the graph is the task here:

# Document Term Matrix
dtm = DocumentTermMatrix(tweets)

### adjust threshold here
dtms = removeSparseTerms(dtm, 0.998)
dim(dtms)

# cosine similarity matrix
t = as.matrix(dtms)

# comparing two word feature vectors
#cosine(t[,"yesterday"], t[,"yet"])

numWords = dim(t)[2]

# cosine measure between all column vectors of a matrix.
adjMat = cosine(t)

r = 3
for(i in 1:numWords)
{
  highElement  = sort(adjMat[i,], partial=numWords-r)[numWords-r]
  adjMat[i,][adjMat[i,] <  highElement] = 0
}

# build graph from the adjacency matrix
g = graph.adjacency(adjMat, weighted=TRUE, mode="undirected", diag=FALSE)
V(g)$name

# remove loop and multiple edges
g = simplify(g)
wt = walktrap.community(g, steps=5) # default steps=2
    table(membership(wt))

# set vertex color & size
nodecolor = rainbow(length(table(membership(wt))))[as.vector(membership(wt))]
nodesize = as.matrix(round((log2(10*membership(wt)))))
nodelayout = layout.fruchterman.reingold(g,niter=1000,area=vcount(g)^1.1,repulserad=vcount(g)^10.0, weights=NULL)

par(mai=c(0,0,1,0))
plot(g,
     layout=nodelayout,
     vertex.size = nodesize,
     vertex.label=NA,
     vertex.color = nodecolor,
     edge.arrow.size=0.2,
     edge.color="grey",
     edge.width=1)

I just want to have some more gap between separate clusters/communities.

解决方案

To the best of my knowledge, you can't layout vertices of the same community close to each other, using igraph only. I have implemented this function in my package NetPathMiner. It seems it is a bit hard to install the package just for the visualization function. I will write the a simple version of it here and explain what it does.

layout.by.attr <- function(graph, wc, cluster.strength=1,layout=layout.auto) {
        g <- graph.edgelist(get.edgelist(graph)) # create a lightweight copy of graph w/o the attributes.
        E(g)$weight <- 1

        attr <- cbind(id=1:vcount(g), val=wc)
        g <- g + vertices(unique(attr[,2])) + igraph::edges(unlist(t(attr)), weight=cluster.strength)

        l <- layout(g, weights=E(g)$weight)[1:vcount(graph),]
        return(l)
}

Basically, the function adds an extra vertex that is connected to all vertices belonging to the same community. The layout is calculated based on the new graph. Since each community is now connected by a common vertex, they tend to cluster together.

As Gabor said in the comment, increasing edge weights will also have similar effect. The function leverages this information, by increasing a cluster.strength, edges between created vertices and their communities are given higher weights.

If this is still not enough, you extend this principle (calculating the layout on a more connected graph) by adding edges between all vertices of the same communities (forming a clique). From my experience, this is a bit of an overkill.

这篇关于如何在R中使用igraph包传播社区图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-31 06:06