python - 快速稀疏矩阵乘法，不分配密集数组

我有一个m x m的稀疏矩阵similarities和一个带有m个元素的向量combined_scales。我希望将similarities中的第i列乘以combined_scales[i]。这是我的第一次尝试：

for i in range(m):
    scale = combined_scales[i]
    similarities[:, i] *= scale

从语义上来说这是正确的，但效果不佳，因此我尝试将其更改为：

# sparse.diags creates a diagonal matrix.
# docs: https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.sparse.diags.html
similarities *= sparse.diags(combined_scales)

但是运行此行时，我立即得到一个MemoryError。奇怪的是，似乎scipy试图在此处分配一个密集的numpy数组：

Traceback (most recent call last):
  File "main.py", line 108, in <module>
    loop.run_until_complete(main())
  File "C:\Users\james\AppData\Local\Programs\Python\Python36-32\lib\asyncio\base_events.py", line 466, in run_until_complete
    return future.result()
  File "main.py", line 100, in main
    magic.fit(df)
  File "C:\cygwin64\home\james\code\py\relativity\ml.py", line 127, in fit
    self._scale_similarities(X, net_similarities)
  File "C:\cygwin64\home\james\code\py\relativity\ml.py", line 148, in _scale_similarities
    similarities *= sparse.diags(combined_scales)
  File "C:\Users\james\AppData\Local\Programs\Python\Python36-32\lib\site-packages\scipy\sparse\base.py", line 440, in __mul__
    return self._mul_sparse_matrix(other)
  File "C:\Users\james\AppData\Local\Programs\Python\Python36-32\lib\site-packages\scipy\sparse\compressed.py", line 503, in _mul_sparse_matrix
    data = np.empty(nnz, dtype=upcast(self.dtype, other.dtype))
MemoryError

如何防止它在此处分配密集数组？谢谢。

最佳答案

来自sparse.compressed

class _cs_matrix    # common for csr and csc
    def _mul_sparse_matrix(self, other):
        M, K1 = self.shape
        K2, N = other.shape

        major_axis = self._swap((M,N))[0]
        other = self.__class__(other)  # convert to this format

        idx_dtype = get_index_dtype((self.indptr, self.indices,
                                     other.indptr, other.indices),
                                    maxval=M*N)
        indptr = np.empty(major_axis + 1, dtype=idx_dtype)

        fn = getattr(_sparsetools, self.format + '_matmat_pass1')
        fn(M, N,
           np.asarray(self.indptr, dtype=idx_dtype),
           np.asarray(self.indices, dtype=idx_dtype),
           np.asarray(other.indptr, dtype=idx_dtype),
           np.asarray(other.indices, dtype=idx_dtype),
           indptr)

        nnz = indptr[-1]
        idx_dtype = get_index_dtype((self.indptr, self.indices,
                                     other.indptr, other.indices),
                                    maxval=nnz)
        indptr = np.asarray(indptr, dtype=idx_dtype)
        indices = np.empty(nnz, dtype=idx_dtype)
        data = np.empty(nnz, dtype=upcast(self.dtype, other.dtype))

        fn = getattr(_sparsetools, self.format + '_matmat_pass2')
        fn(M, N, np.asarray(self.indptr, dtype=idx_dtype),
           np.asarray(self.indices, dtype=idx_dtype),
           self.data,
           np.asarray(other.indptr, dtype=idx_dtype),
           np.asarray(other.indices, dtype=idx_dtype),
           other.data,
           indptr, indices, data)

        return self.__class__((data,indices,indptr),shape=(M,N))

similarities是稀疏的csr矩阵。 other（diag矩阵）也已在

other = self.__class__(other)

csr_matmat_pass1（已编译的代码）使用self和other的索引运行，返回nnz，即输出中非零项的数量。

然后，它分配indptr，indices和data数组，这些数组将保存csr_matmat_pass2的结果。这些用于创建返回矩阵

self.__class__((data,indices,indptr),shape=(M,N))

创建data数组时发生错误：

data = np.empty(nnz, dtype=upcast(self.dtype, other.dtype))

返回结果中包含太多非零值的内存。

什么是m和similarities.nnz？

有足够的内存来执行similarities.copy()吗？

使用similarities *= ...时，首先必须执行similarities * other。结果将替换self。它不会尝试进行就地乘法。

按列进行迭代

关于按行（或列）进行更快的迭代，试图进行诸如排序或获取最大的行值之类的事情，存在很多问题。直接使用csr属性可以大大加快此过程。我认为这个想法在这里适用

例：

In [275]: A = sparse.random(10,10,.2,'csc').astype(int)
In [276]: A.data[:] = np.arange(1,21)
In [277]: A.A
Out[277]:
array([[ 0,  0,  4,  0,  0,  0,  0,  0,  0,  0],
       [ 0,  3,  0,  0,  0,  0,  0,  0,  0,  0],
       [ 1,  0,  0,  0,  0, 10,  0,  0, 16, 18],
       [ 0,  0,  0,  0,  0, 11, 14,  0,  0,  0],
       [ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0],
       [ 0,  0,  0,  0,  8,  0,  0,  0,  0,  0],
       [ 0,  0,  0,  0,  9, 12,  0,  0, 17,  0],
       [ 2,  0,  0,  0,  0, 13,  0,  0,  0,  0],
       [ 0,  0,  5,  7,  0,  0,  0, 15,  0, 19],
       [ 0,  0,  6,  0,  0,  0,  0,  0,  0, 20]])
In [280]: B = sparse.diags(np.arange(1,11),dtype=int)
In [281]: B
Out[281]:
<10x10 sparse matrix of type '<class 'numpy.int64'>'
    with 10 stored elements (1 diagonals) in DIAgonal format>
In [282]: (A*B).A
Out[282]:
array([[  0,   0,  12,   0,   0,   0,   0,   0,   0,   0],
       [  0,   6,   0,   0,   0,   0,   0,   0,   0,   0],
       [  1,   0,   0,   0,   0,  60,   0,   0, 144, 180],
       [  0,   0,   0,   0,   0,  66,  98,   0,   0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0],
       [  0,   0,   0,   0,  40,   0,   0,   0,   0,   0],
       [  0,   0,   0,   0,  45,  72,   0,   0, 153,   0],
       [  2,   0,   0,   0,   0,  78,   0,   0,   0,   0],
       [  0,   0,  15,  28,   0,   0,   0, 120,   0, 190],
       [  0,   0,  18,   0,   0,   0,   0,   0,   0, 200]], dtype=int64)

在列上进行迭代：

In [283]: A1=A.copy()
In [284]: for i,j,v in zip(A1.indptr[:-1],A1.indptr[1:],np.arange(1,11)):
     ...:     A1.data[i:j] *= v
     ...:
In [285]: A1.A
Out[285]:
array([[  0,   0,  12,   0,   0,   0,   0,   0,   0,   0],
       [  0,   6,   0,   0,   0,   0,   0,   0,   0,   0],
       [  1,   0,   0,   0,   0,  60,   0,   0, 144, 180],
       [  0,   0,   0,   0,   0,  66,  98,   0,   0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0],
       [  0,   0,   0,   0,  40,   0,   0,   0,   0,   0],
       [  0,   0,   0,   0,  45,  72,   0,   0, 153,   0],
       [  2,   0,   0,   0,   0,  78,   0,   0,   0,   0],
       [  0,   0,  15,  28,   0,   0,   0, 120,   0, 190],
       [  0,   0,  18,   0,   0,   0,   0,   0,   0, 200]])

时间比较：

In [287]: %%timeit A1=A.copy()
     ...: A1 *= B
     ...:
375 µs ± 1.29 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [288]: %%timeit A1 = A.copy()
     ...: for i,j,v in zip(A1.indptr[:-1],A1.indptr[1:],np.arange(1,11)):
     ...:     A1.data[i:j] *= v
     ...:
79.9 µs ± 1.47 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)