问题描述
我想从一个简单的Pandas DataFrame创建一些NetworkX图:
I'd like to create some NetworkX graphs from a simple Pandas DataFrame:
Loc 1 Loc 2 Loc 3 Loc 4 Loc 5 Loc 6 Loc 7
Foo 0 0 1 1 0 0 0
Bar 0 0 1 1 0 1 1
Baz 0 0 1 0 0 0 0
Bat 0 0 1 0 0 1 0
Quux 1 0 0 0 0 0 0
其中Foo…
是索引,而Loc 1
至Loc 7
是列.但是转换为Numpy矩阵或Recarray似乎无法为nx.Graph()
生成输入.是否有实现这一目标的标准策略?我不反对在Pandas中重新格式化数据->转储为CSV->导入到NetworkX,但是似乎我应该能够从索引中生成边缘,并从值中生成节点.
Where Foo…
is the index, and Loc 1
to Loc 7
are the columns. But converting to Numpy matrices or recarrays doesn't seem to work for generating input for nx.Graph()
. Is there a standard strategy for achieving this? I'm not averse the reformatting the data in Pandas --> dumping to CSV --> importing to NetworkX, but it seems as if I should be able to generate the edges from the index and the nodes from the values.
推荐答案
NetworkX期望(节点和边的)方阵,也许*您想通过它:
NetworkX expects a square matrix (of nodes and edges), perhaps* you want to pass it:
In [11]: df2 = pd.concat([df, df.T]).fillna(0)
注意:索引和列的顺序相同很重要!
In [12]: df2 = df2.reindex(df2.columns)
In [13]: df2
Out[13]:
Bar Bat Baz Foo Loc 1 Loc 2 Loc 3 Loc 4 Loc 5 Loc 6 Loc 7 Quux
Bar 0 0 0 0 0 0 1 1 0 1 1 0
Bat 0 0 0 0 0 0 1 0 0 1 0 0
Baz 0 0 0 0 0 0 1 0 0 0 0 0
Foo 0 0 0 0 0 0 1 1 0 0 0 0
Loc 1 0 0 0 0 0 0 0 0 0 0 0 1
Loc 2 0 0 0 0 0 0 0 0 0 0 0 0
Loc 3 1 1 1 1 0 0 0 0 0 0 0 0
Loc 4 1 0 0 1 0 0 0 0 0 0 0 0
Loc 5 0 0 0 0 0 0 0 0 0 0 0 0
Loc 6 1 1 0 0 0 0 0 0 0 0 0 0
Loc 7 1 0 0 0 0 0 0 0 0 0 0 0
Quux 0 0 0 0 1 0 0 0 0 0 0 0
In[14]: graph = nx.from_numpy_matrix(df2.values)
这不会将列/索引名称传递给图形,如果您想这样做,可以使用(您可能要警惕重复,这在熊猫的DataFrames中是允许的):
This doesn't pass the column/index names to the graph, if you wanted to do that you could use relabel_nodes
(you may have to be wary of duplicates, which are allowed in pandas' DataFrames):
In [15]: graph = nx.relabel_nodes(graph, dict(enumerate(df2.columns))) # is there nicer way than dict . enumerate ?
*目前尚不清楚所需图形的列和索引代表什么.
这篇关于从Pandas DataFrame构造NetworkX图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!