问题描述
我有一个非常大的字典包含元组作为键和它们的值。这个字典应该用一个字共现向量来代表一个邻接矩阵,例如工作出现在经验16次,工作出现在服务15次之后。这是否是首选的存储方法是另一个问题(拥有大量数据,嵌套字典成为遍历的噩梦),但它现在只是我所拥有的。 频率:{
(工作,体验):16,
','services'):25,
('must','services'):15,
('data','services'):10,
...
...}
感谢上一篇文章,我已经能够做一个简单的二进制邻接矩阵与NetworkX,只需使用这种方法:
A = Frequency.keys()
networkx.Graph (A)
结果很棒,但是我的问题是我要做什么来转换使用其共同值作为矩阵中的值,将频率转换为邻接矩阵,以便结果将会沿着这样的方向看起来:
数组([[0.,16.,25.,0.],
[16.,0.,1.,0.],
[ 1.,0.,1.],
[10.,0.,0.,0.]
...)
如果这与以前的位置相似,我们深表歉意ts,但是我找不到正确的方式将这些元组转换成可以在NetworkX中使用的矩阵。我假设我会使用numpy,但是我找不到任何这样的方法的文档。
提前感谢
Ron
>>>频率= {('work','experience'):16,
...('work','services'):25,
...('must','services') :15,
...('data','services'):10}
>>> keys = np.array(frequency.keys())
>>> vals = np.array(frequency.values())
>>>键
数组([['work','services'],
['must','services'],
['work','experience'],
['data','services']],
dtype ='| S10')
>>> vals
数组([25,15,16,10])
>>> unq_keys,key_idx = np.unique(keys,return_inverse = True)
>>> key_idx = key_idx.reshape(-1,2)
>>>> unq_keys
数组(['data','experience','must','services','work'],
dtype ='| S10')
>>> key_idx
数组([[4,3],
[2,3],
[4,1],
[0,3]])
> ;>> n = len(unq_keys)
>>>> adj = np.zeros((n,n),dtype = vals.dtype)
>>> adj [key_idx [:,0],key_idx [:,1]] = vals
>>> adj
array([[0,0,0,10,0],
[0,0,0,0,0],
[0,0,0,15,0 ],
[0,0,0,0,0],
[0,16,0,25,0]])
>>> adj + = adj.T
>>> adj
array([[0,0,0,10,0],
[0,0,0,0,16],
[0,0,0,15,0 ],
[10,0,15,0,25],
[0,16,0,25,0]]
I have a very large dictionary containing tuples as keys and their values. This dictionary is supposed to represent an adjacency matrix with word co-occurrence vectors, eg 'work' appears with 'experience' 16 times and 'work' appears with 'services' 15 times. Whether or not this is the preferred storage method is another issue (with the massive amount of data I have, nested dictionaries became a nightmare for traversal), but it's simply what I have for right now.
Frequency:{
('work', 'experience'): 16,
('work', 'services'): 25,
('must', 'services'): 15,
('data', 'services'): 10,
...
...}
Thanks to a previous post, I've been able to do a simple binary adjacency matrix with NetworkX, simply by using this methodology:
A=Frequency.keys()
networkx.Graph(A)
That result was great then, but my question is what do I have to do to convert Frequency into an adjacency matrix using its co-occurrence value as the value in the matrix, so that the result would it would look something along the lines of this:
array([[ 0., 16., 25., 0.],
[ 16., 0., 1., 0.],
[ 25., 1., 0., 1.],
[ 10., 0., 0., 0.]
...)
I apologize if this is similar to previous posts, but I just can't find the correct way to convert these tuples to a matrix that I can use in NetworkX. I'm assuming I would use numpy, but I cannot find any documentation for a method like this.
Thanks in advance,
Ron
This answer may be of help. With your sample data:
>>> frequency = {('work', 'experience'): 16,
... ('work', 'services'): 25,
... ('must', 'services'): 15,
... ('data', 'services'): 10}
>>> keys = np.array(frequency.keys())
>>> vals = np.array(frequency.values())
>>> keys
array([['work', 'services'],
['must', 'services'],
['work', 'experience'],
['data', 'services']],
dtype='|S10')
>>> vals
array([25, 15, 16, 10])
>>> unq_keys, key_idx = np.unique(keys, return_inverse=True)
>>> key_idx = key_idx.reshape(-1, 2)
>>> unq_keys
array(['data', 'experience', 'must', 'services', 'work'],
dtype='|S10')
>>> key_idx
array([[4, 3],
[2, 3],
[4, 1],
[0, 3]])
>>> n = len(unq_keys)
>>> adj = np.zeros((n, n) ,dtype=vals.dtype)
>>> adj[key_idx[:,0], key_idx[: ,1]] = vals
>>> adj
array([[ 0, 0, 0, 10, 0],
[ 0, 0, 0, 0, 0],
[ 0, 0, 0, 15, 0],
[ 0, 0, 0, 0, 0],
[ 0, 16, 0, 25, 0]])
>>> adj += adj.T
>>> adj
array([[ 0, 0, 0, 10, 0],
[ 0, 0, 0, 0, 16],
[ 0, 0, 0, 15, 0],
[10, 0, 15, 0, 25],
[ 0, 16, 0, 25, 0]])
这篇关于将元组的字典转换为数字矩阵的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!