给定JSON文件(可能约2GB以上)中节点之间的交易,其中有〜百万个节点和〜1000万个交易,每个交易有10-1000个节点,例如
{"transactions":
[
{"transaction 1": ["node1","node2","node7"], "weight":0.41},
{"transaction 2": ["node4","node2","node1","node3","node10","node7","node9"], "weight":0.67},
{"transaction 3": ["node3","node10","node11","node2","node1"], "weight":0.33},...
]
}
将其转换为节点亲和度矩阵的最优雅,最有效的pythonic方法是什么,其中亲和度是节点之间的加权事务之和。
affinity [i,j] = weighted transaction count between nodes[i] and nodes[j] = affinity [j,i]
例如
affinity[node1, node7] = [0.41 (transaction1) + 0.67 (transaction2)] / 2 = affinity[node7, node1]
注意:亲和度矩阵将是对称的,因此仅计算下三角即可。
值不代表***仅结构示例!
节点1 | node2 |节点3 |节点4 | ....节点1 1 .4 .1 0.9 ...节点2 0.4 1.6 0.3 ...节点3 1.1.6 1.7 ...节点4 0.9 0.3 0.7
1 ......
最佳答案
首先,我将清理数据并用整数表示每个节点,并从这样的字典开始
data=[{'transaction': [1, 2, 7], 'weight': 0.41},
{'transaction': [4, 2, 1, 3, 10, 7, 9], 'weight': 0.67},
{'transaction': [3, 10, 11, 2, 1], 'weight': 0.33}]
不知道这是否足够pythonic,但是应该不言自明
def weight(i,j,data_item):
return data_item["weight"] if i in data_item["transaction"] and j in data_item["transaction"] else 0
def affinity(i,j):
if j<i: # matrix is symmetric
return affinity(j,i)
else:
weights = [weight(i,j,data_item) for data_item in data if weight(i,j,data_item)!=0]
if len(weights)==0:
return 0
else:
return sum(weights) / float(len(weights))
ln = 10 # number of nodes
A = [[affinity(i,j) for j in range(1,ln+1)] for i in range(1,ln+1)]
查看亲和度矩阵
import numpy as np
print(np.array(A))
[[ 0.47 0.47 0.5 0.67 0. 0. 0.54 0. 0.67 0.5 ]
[ 0.47 0.47 0.5 0.67 0. 0. 0.54 0. 0.67 0.5 ]
[ 0.5 0.5 0.5 0.67 0. 0. 0.67 0. 0.67 0.5 ]
[ 0.67 0.67 0.67 0.67 0. 0. 0.67 0. 0.67 0.67]
[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. ]
[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. ]
[ 0.54 0.54 0.67 0.67 0. 0. 0.54 0. 0.67 0.67]
[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. ]
[ 0.67 0.67 0.67 0.67 0. 0. 0.67 0. 0.67 0.67]
[ 0.5 0.5 0.5 0.67 0. 0. 0.67 0. 0.67 0.5 ]]
关于python - 如何从交易行中有效构造亲和度矩阵?,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/44451015/