我有一个包含两家公司交易信息的数据框
df
idA idB amount nameA nameB
0 4 5 300 xxx yyy
1 3 7 150 kkk uuu
2 3 6 289 kkk vvv
3 1 4 189 hhh iii
我想使用
networkx
包创建一个网络。G=nx.Graph()
for i in df.index:
G.add_node(df['idA'][i], name = df['nameA'][i])
G.add_node(df['idB'][i], name = df['nameB'][i])
G.add_edge(df['idA'][i], df['idB'][i], weight = df['amount'][i] )
我想知道是否有更有效的方法
最佳答案
答案是肯定的。只要看一下这个文档:https://networkx.github.io/documentation/latest/reference/generated/networkx.convert_matrix.from_pandas_edgelist.html
如果是你,我会:
G=nx.from_pandas_edgelist(df, 'idA', 'idB', ['amount'])
如果要向节点添加其他属性,请遵循以下文档:https://networkx.github.io/documentation/networkx-1.9/reference/generated/networkx.classes.function.set_node_attributes.html
编辑:
很抱歉,但我没有注意到networkx 2.0中的
from_pandas_dataframe
已被删除。非常感谢@tohv回答了这个问题here最后,正如我所评论的,这些是优化的函数。如果我们比较它们在执行for循环的相同函数时的速度,则差异是一致的。
from random import randint
import pandas as pd
import networkx as nx
from time import time
import numpy as np
df = pd.DataFrame()
df['a'] = [randint(0, 100) for _ in range(10000)]
df['b'] = [randint(0, 100) for _ in range(10000)]
c = 0
runs = []
while c <= 100:
G=nx.Graph()
start = time()
for i in df.index:
G.add_node(df['a'][i], name = df['a'][i])
G.add_node(df['b'][i], name = df['b'][i])
G.add_edge(df['a'][i], df['b'][i])
run = time() - start
runs.append(run)
c += 1
print ('done in:', np.mean(runs), 'seconds')
完成时间:0.6191224154859486秒
c = 0
runs = []
while c <= 100:
G=nx.Graph()
start = time()
G=nx.from_pandas_edgelist(df, 'a', 'b')
run = time() - start
runs.append(run)
c += 1
print ('done in:', np.mean(runs), 'seconds')
完成时间:0.014413160852866598秒
关于python - Python:创建网络的最佳方法?,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/50602610/