问题描述
我有一个数据框,该数据框由格式为df = pd.DataFrame(["A", "B", "Count", "some_attribute"])
的可能的网络连接组成.此数据框表示这样的连接:
I have a dataframe consisting of possible network connections in the format df = pd.DataFrame(["A", "B", "Count", "some_attribute"])
. This dataframe represents connections like this:
- A与B有联系
- 此连接发生了计数"次
- 此连接具有特定的属性(即特定的联系人类型)
我想将此数据框导出为graphml格式.使用以下代码可以正常工作:
I want to export this Dataframe to the graphml format. It works fine using the following code:
import networkx as nx
G = nx.Graph()
G.add_weighted_edges_from(df[["A", "B", "Count"]].values)
nx.write_graphml(G, "my_graph.graphml")
此代码将生成带有正确图形的graphml文件,我可以将其与Gephi一起使用.现在,我想添加一个属性:
This code results in a graphml file with the correct graph, which I can use with Gephi. Now I want to add an attribute:
G = nx.Graph()
G.add_weighted_edges_from(df[["A", "B", "Count"]].values, attr=df["some_attribute"].values)
nx.write_graphml(G, "my_graph.graphml")
每当我尝试在此代码中添加属性时,就不可能将其写入graphml文件.使用此代码,我收到以下错误消息:
Whenever I try to add attributes in this code, it becomes impossible to write it to a graphml file. With this code, I get the following error message:
NetworkXError: GraphML writer does not support <class 'numpy.ndarray'> as data values.
我找到了相关的文章(例如此之一) ,但没有为该问题提供任何解决方案.有没有人有使用networkx将属性添加到graphml文件的解决方案,以便我可以在Gephi中使用它们?
I found related articles (like this one), but it didn't provide any solution for this problem. Does anyone have a solution for adding attributes to a graphml file using networkx so I can use them in Gephi?
推荐答案
假定随机DataFrame:
Assuming the random DataFrame:
import pandas as pd
df = pd.DataFrame({'A': [0,1,2,0,0],
'B': [1,2,3,2,3],
'Count': [1,2,5,1,1],
'some_attribute': ['red','blue','red','blue','red']})
A B Count some_attribute
0 0 1 1 red
1 1 2 2 blue
2 2 3 5 red
3 0 2 1 blue
4 0 3 1 red
按照上面的代码实例化Graph
:
Following the code from above to instantiate a Graph
:
import networkx as nx
G = nx.Graph()
G.add_weighted_edges_from(df[["A","B", "Count"]].values, attr=df["some_attribute"].values)
在检查边缘时,似乎将numpy
数组df['some_attribute'].values
作为属性分配给每个边缘:
when inspecting an edge, it appears that the numpy
array, df['some_attribute'].values
, gets assigned as an attribute to each edge:
print (G.edge[0][1])
print (G.edge[2][3])
{'attr': array(['red', 'blue', 'red', 'blue', 'red'], dtype=object), 'weight': 1}
{'attr': array(['red', 'blue', 'red', 'blue', 'red'], dtype=object), 'weight': 5}
如果我正确理解了您的意图,则假设您希望每个边的属性都与df['some_attribute']
列相对应.
If I understand your intent correctly, I'm assuming you want each edge's attribute to correspond to the df['some_attribute']
column.
您可能会发现使用 nx.from_pandas_dataframe()
,尤其是因为您已经在DataFrame
对象中设置了数据格式.
You may find it easier to create your Graph
using nx.from_pandas_dataframe()
, especially since you already have data formatted in a DataFrame
object.
G = nx.from_pandas_dataframe(df, 'A', 'B', ['Count', 'some_attribute'])
print (G.edge[0][1])
print (G.edge[2][3])
{'Count': 1, 'some_attribute': 'red'}
{'Count': 5, 'some_attribute': 'red'}
写入文件没问题:
nx.write_graphml(G,"my_graph.graphml")
除了,我不是Gephi的普通用户,因此可能存在另一种解决以下问题的方法.当我使用'Count'
作为edge属性加载文件时,默认情况下,Gephi图无法识别边缘权重.因此,我将列名从'Count'
更改为'weight'
,并在加载到Gephi中时看到了以下内容:
except, I'm not a regular Gephi user so there may be another way to solve the following. When I loaded the file with 'Count'
as the edge attribute, the Gephi graph didn't recognize edge weights by default. So I changed the column name from 'Count'
to 'weight'
and saw the following when I loaded into Gephi:
df.columns=['A', 'B', 'weight', 'some_attribute']
G = nx.from_pandas_dataframe(df, 'A', 'B', ['weight', 'some_attribute'])
nx.write_graphml(G,"my_graph.graphml")
希望这会有所帮助,并且我能正确理解您的问题.
Hope this helps and that I understood your question correctly.
根据以上Corley的评论,如果您选择使用add_edges_from
,则可以使用以下内容.
Per Corley's comment above, you can use the following if you choose to use add_edges_from
.
G.add_edges_from([(u,v,{'weight': w, 'attr': a}) for u,v,w,a in df[['A', 'B', 'Count', 'some_attribute']].values ])
没有明显的性能提升,但是我发现from_pandas_dataframe
更具可读性.
There is no significant performance gain, however I find from_pandas_dataframe
more readable.
import numpy as np
df = pd.DataFrame({'A': np.arange(0,1000000),
'B': np.arange(1,1000001),
'Count': np.random.choice(range(10), 1000000, replace=True),
'some_attribute': np.random.choice(['red','blue'], 1000000, replace=True,)})
%%timeit
G = nx.Graph()
G.add_edges_from([(u,v,{'weight': w, 'attr': a}) for u,v,w,a in df[['A', 'B', 'Count', 'some_attribute']].values ])
1 loop, best of 3: 4.23 s per loop
%%timeit
G = nx.Graph()
G = nx.from_pandas_dataframe(df, 'A', 'B', ['Count', 'some_attribute'])
1 loop, best of 3: 3.93 s per loop
这篇关于在Gephi中打开之前,请在Networkx write_graphml中添加属性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!