问题描述
我在igraph中有一个大图(实际上是几个),大约有100,000个顶点,每个顶点的属性为true
或false
.对于每个顶点,我想计算有多少个直接连接到该顶点的具有该属性的顶点.我当前的解决方案是以下函数,该函数以图形为参数.
I have a large graph (several, actually) in igraph—on the order of 100,000 vertices—and each vertex has an attribute which is either true
or false
. For each vertex, I would like to count how many of the vertices directly connected to it have the attribute. My current solution is the following function, which takes as its argument a graph.
attrcount <- function(g) {
nb <- neighborhood(g,order=1)
return(sapply(nb,function(x) {sum(V(g)$attr[x]}))
}
对于具有该属性的顶点,它返回一个计数向量,该计数与1偏离,但是我可以轻松地对其进行调整.
This returns a vector of counts which is off by 1 for vertices which have the attribute, but I can adjust this easily.
问题是运行速度非常慢,并且似乎应该有一种快速的方法来进行此操作,例如,计算每个顶点的度实际上是使用degree(g)
即时实现的.
The problem is that this runs incredibly slowly, and it seems like there should be a fast way to do this, since, for instance, computing the degree of each vertex is practically instantaneous with degree(g)
.
我这样做是不是很愚蠢?
Am I doing this a stupid way?
作为一个例子,假设这是我们的图形.
As an example, suppose this was our graph.
set.seed(42)
g <- erdos.renyi.game(169081, 178058, type="gnm")
V(g)$att <- as.logical(rbinom(vcount(g), 1, 0.5))
推荐答案
使用get.adjlist
查询所有相邻的顶点,然后在此列表上的sapply
(或tapply
可能更快)以获取计数.将属性存储在向量中也是值得的,因为这样就不必一直提取它.
Use get.adjlist
to query all adjacent vertices, and then sapply
(or tapply
might be even faster) on this list to get the counts. It is also worth storing the attribute in a vector, because then you don't need to extract it all the time.
system.time({
al <- get.adjlist(g)
att <- V(g)$att
res <- sapply(al, function(x) sum(att[x]))
})
# user system elapsed
# 0.571 0.005 0.576
轻按
system.time({
al <- get.adjlist(g)
alv <- unlist(al)
alf <- factor(rep(seq_along(al), sapply(al, length)),
levels=seq_along(al))
att <- V(g)$att
res2 <- tapply(att[alv], alf, sum)
res2[is.na(res2)] <- 0
})
# user system elapsed
# 1.121 0.020 1.144
all(res == res2)
# TRUE
我有些惊讶,但是tapply
解决方案实际上要慢一些.
Somewhat a surprise to me, but the tapply
solution is actually slower.
如果这还不够,那么我想您仍然可以通过用C/C ++编写它来使其更快.
If this is still not enough, then I guess you can still make it faster by writing it in C/C++.
这篇关于计算顶点邻域中有多少个顶点在igraph中具有R的属性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!