


>>> import numpy as np
>>> from scipy.sparse import csr_matrix as csr
>>> M = csr(np.random.random((8,8))>0.9)
>>> M
<8x8 sparse matrix of type '<type 'numpy.bool_'>'
        with 6 stored elements in Compressed Sparse Row format>
>>> M[:,0] = False
>>> M
<8x8 sparse matrix of type '<type 'numpy.bool_'>'
        with 12 stored elements in Compressed Sparse Row format>
>>> M[:,0].multiply(np.array([[False] for i in xrange(8)]))
>>> M
<8x8 sparse matrix of type '<type 'numpy.bool_'>'
        with 12 stored elements in Compressed Sparse Row format>


从数学/逻辑角度来看,当将稀疏矩阵或向量相乘时,所有空白单元格都一定会像0*x == 0那样保持空白.设置为零的情况也一样:零单元不需要明确地设置为零.


我正在使用 scipy版本0.17.0




>>> M = csr(np.random.random((1000,1000))>0.9, dtype=float)
>>> M
<1000x1000 sparse matrix of type '<class 'numpy.float64'>'
    with 99740 stored elements in Compressed Sparse Row format>

>>> M[:, 0] *= 0
>>> M
<1000x1000 sparse matrix of type '<class 'numpy.float64'>'
    with 99740 stored elements in Compressed Sparse Row format>

>>> M.eliminate_zeros()
>>> M
<1000x1000 sparse matrix of type '<class 'numpy.float64'>'
    with 99657 stored elements in Compressed Sparse Row format>

Scipy 可以进行此类操作后自动调用eliminate_zeros例程,但是开发人员选择在执行诸如更改稀疏结构之类的昂贵操作时,为用户提供更大的灵活性和控制力. /p>

In scipy, when I multiply a slice of a sparse matrix with an array containing only zeros, the result is a matrix that is less or equally sparse than before, even though it should be more or equally sparse. The same holds for setting parts of the matrix to 0 or False:

>>> import numpy as np
>>> from scipy.sparse import csr_matrix as csr
>>> M = csr(np.random.random((8,8))>0.9)
>>> M
<8x8 sparse matrix of type '<type 'numpy.bool_'>'
        with 6 stored elements in Compressed Sparse Row format>
>>> M[:,0] = False
>>> M
<8x8 sparse matrix of type '<type 'numpy.bool_'>'
        with 12 stored elements in Compressed Sparse Row format>
>>> M[:,0].multiply(np.array([[False] for i in xrange(8)]))
>>> M
<8x8 sparse matrix of type '<type 'numpy.bool_'>'
        with 12 stored elements in Compressed Sparse Row format>

This is actually computationally expensive for large matrices, because it iterates over all cells in the slice, not just the nonzero ones.

From a mathematical / logical point of view, when multiplying a sparse matrix or vector, all empty cells are certain to remain empty as 0*x == 0. The same holds for setting to zero: zero-cells do not need to be explicitely set to zero.

What is the best way to deal with this?

I am using scipy version 0.17.0


In working with sparse matrices, changing the sparsity pattern is generally a very expensive operation, and so scipy does not do this silently.

If you want to remove explicitly stored zeros from a sparse matrix, you should use the eliminate_zeros() method; for example:

>>> M = csr(np.random.random((1000,1000))>0.9, dtype=float)
>>> M
<1000x1000 sparse matrix of type '<class 'numpy.float64'>'
    with 99740 stored elements in Compressed Sparse Row format>

>>> M[:, 0] *= 0
>>> M
<1000x1000 sparse matrix of type '<class 'numpy.float64'>'
    with 99740 stored elements in Compressed Sparse Row format>

>>> M.eliminate_zeros()
>>> M
<1000x1000 sparse matrix of type '<class 'numpy.float64'>'
    with 99657 stored elements in Compressed Sparse Row format>

Scipy could call the eliminate_zeros routine automatically after doing this kind of operation, but the developers chose to give the user more flexibility and control when doing something as expensive as changing the sparsity structure.


08-19 22:05