NetworkX有哪些可扩展性问题

NetworkX有哪些可扩展性问题

本文介绍了NetworkX有哪些可扩展性问题?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对有数百万个节点和数千万个边缘的大型网络的网络分析感兴趣。我希望能够做许多事情,比如解析来自多种格式的网络,查找连接的组件,检测社区,以及运行诸如PageRank等中心性措施。



我被NetworkX吸引,因为它有一个很好的api,很好的文档,并且多年来一直在积极开发中。另外,因为它是用python编写的,所以它应该很快就可以开发出来。



在最近的演示文稿中(幻灯片在github上可用),它声称:

演示文稿还指出,NetworkX的基本算法是在C / Fortran中实现的。然而,从源代码看,它看起来像NetworkX主要是用python编写的。我对源代码并不太熟悉,但我知道有几个例子,其中NetworkX使用numpy来完成繁重的任务(反过来使用C / Fortran来完成线性代数)。例如,文件 networkx / networkx / algorithms / centrality / eigenvector.py 使用numpy来计算特征向量。

有没有人知道这种调用像numpy这样的优化库的策略在整个NetworkX中是非常普遍的,还是只有几个算法可以实现呢?还有谁可以描述与NetworkX相关的其他可扩展性问题?



来自NetworkX Lead Programmer的回复
我在NetworkX邮件中提出了这个问题列表中,并且Aric Hagberg回答说:


解决方案

你的大问题将会是记忆。 Python简单地不能处理数以千万计的对象,而不会在你的类实现中跳过循环。许多对象的内存开销太高,达到2GB,32位代码无法工作。有办法解决它 - 使用插槽,阵列或numpy。它应该是可以的,因为networkx是为了性能而编写的,但是如果有几件事情不起作用,我会检查你的内存使用情况。



至于缩放,算法基本上是唯一与图形有关的事情。如果图形算法做得不对,那么图形算法往往具有很大的缩放比例,并且它们可能会像其他任何语言一样在Python中完成。

I'm interested in network analysis on large networks with millions of nodes and tens of millions of edges. I want to be able to do things like parse networks from many formats, find connected components, detect communities, and run centrality measures like PageRank.

I am attracted to NetworkX because it has a nice api, good documentation, and has been under active development for years. Plus because it is in python, it should be quick to develop with.

In a recent presentation (the slides are available on github here), it was claimed that:

The presentation also states that the base algorithms of NetworkX are implemented in C/Fortran.

However, looking at the source code, it looks like NetworkX is mostly written in python. I am not too familiar with the source code, but I am aware of a couple of examples where NetworkX uses numpy to do heavy lifting (which in turn uses C/Fortran to do linear algebra). For example, the file networkx/networkx/algorithms/centrality/eigenvector.py uses numpy to calculate eigenvectors.

Does anyone know if this strategy of calling an optimized library like numpy is really prevalent throughout NetworkX, or if just a few algorithms do it? Also can anyone describe other scalability issues associated with NetworkX?

Reply from NetworkX Lead ProgrammerI posed this question on the NetworkX mailing list, and Aric Hagberg replied:

解决方案

Your big issue will be memory. Python simply cannot handle tens of millions of objects, without jumping through hoops in your class implementation. The memory overhead of many objects is too high, you hit 2GB, and 32 bit code won't work. There's ways around it - using slots, arrays, or numpy. It should be OK, because networkx was written for performance, but if there's a few things that just don't work I'd check your memory usage.

As for scaling, algorithms are basically the only thing that matter with graphs. Graph algorithms tend to have really ugly scaling if they are done wrong, and they are just as likely to be done right in Python as any other language.

这篇关于NetworkX有哪些可扩展性问题?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-24 02:52