问题描述
我有一个稀疏矩阵.我需要逐行对此矩阵进行排序,并创建另一个[sparse]矩阵.代码可能会更好地解释它:
I have a sparse matrix. I need to sort this matrix row-by-row and create another [sparse] matrix.Code may explain it better:
# for `rand` function, you need newer version of scipy.
from scipy.sparse import *
m = rand(6,6, density=0.6)
d = m.getrow(0)
print d
Output1
(0, 5) 0.874881629788
(0, 4) 0.352559852239
(0, 2) 0.504791645463
(0, 1) 0.885898140175
我有这个m
矩阵.我想创建一个具有排序版本的m的新矩阵.新矩阵包含这样的第0行.
I have this m
matrix. I want to create a new matrix with sorted version of m. The new matrixcontains 0'th row like this.
new_d = new_m.getrow(0)
print new_d
Output2
(0, 1) 0.885898140175
(0, 5) 0.874881629788
(0, 2) 0.504791645463
(0, 4) 0.352559852239
所以我可以获得哪一列更大,等等:
So I can obtain which column is bigger etc:
print new_d.indices
Output3
array([1, 5, 2, 4])
当然,每一行都应该像上面一样独立地进行排序.
Of course every row should be sorted like above independently.
对于这个问题,我有一个解决方案,但这并不优雅.
I have one solution for this problem but it is not elegant.
推荐答案
如果您愿意忽略矩阵的零值元素,那么下面的代码应该可以使用.它也比使用getrow方法的实现要快得多,后者相当慢.
If you're willing to ignore the zero-value elements of the matrix, the code below should work. It is also much faster than implementations that use the getrow method, which is rather slow.
from itertools import izip
def sort_coo(m):
tuples = izip(m.row, m.col, m.data)
return sorted(tuples, key=lambda x: (x[0], x[2]))
例如:
>>> from numpy.random import rand
>>> from scipy.sparse import coo_matrix
>>>
>>> d = rand(10, 20)
>>> d[d > .05] = 0
>>> s = coo_matrix(d)
>>> sort_coo(s)
[(0, 2, 0.004775589084940246),
(3, 12, 0.029941507166614145),
(5, 19, 0.015030386789436245),
(7, 0, 0.0075044957259399192),
(8, 3, 0.047994403933129481),
(8, 5, 0.049401058471327031),
(9, 15, 0.040011608000125043),
(9, 8, 0.048541825332137023)]
根据您的需要,您可能需要调整lambda中的排序键或进一步处理输出.如果您希望将所有内容都编入索引的字典中,则可以执行以下操作:
Depending on your needs you may want to tweak the sort keys in the lambda or further process the output. If you want everything in a row indexed dictionary you could do:
from collections import defaultdict
sorted_rows = defaultdict(list)
for i in sort_coo(m):
sorted_rows[i[0]].append((i[1], i[2]))
这篇关于稀疏矩阵排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!