本文介绍了如何将 scipy.sparse 矩阵元素乘以广播的密集一维数组?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有一个二维稀疏数组.在我的实际用例中,行数和列数都大得多(比如 20000 和 50000),因此当使用密集表示时它无法放入内存中:

>>>将 numpy 导入为 np>>>将 scipy.sparse 导入为 ssp>>>a = ssp.lil_matrix((5, 3))>>>a[1, 2] = -1>>>[4, 1] = 2>>>a.todense()矩阵([[ 0., 0., 0.],[ 0., 0., -1.],[ 0., 0., 0.],[ 0., 0., 0.],[ 0., 2., 0.]])

现在假设我有一个密集的一维数组,其中包含大小为 3(或在我的现实生活中为 50000)的所有非零分量:

>>>d = np.ones(3) * 3>>>d数组([ 3., 3., 3.])

我想使用 numpy 的常用广播语义计算 a 和 d 的元素乘法.然而,scipy 中的稀疏矩阵属于 np.matrix:'*' 运算符被重载以使其表现得像矩阵乘法而不是元素乘法:

>>>广告数组([ 0., -3., 0., 0., 6.])

一种解决方案是将a"切换到*"运算符的数组语义,这将给出预期的结果:

>>>a.toarray() * d数组([[ 0., 0., 0.],[ 0., 0., -3.],[ 0., 0., 0.],[ 0., 0., 0.],[ 0., 6., 0.]])

但我不能这样做,因为对 toarray() 的调用会实现不适合内存的密集版本 'a'(结果也会很密集):

>>>ssp.issparse(a.toarray())错误的

知道如何在仅保留稀疏数据结构的同时构建它,而不必在 'a' 的列上执行低效的 python 循环吗?

解决方案

我也在 scipy.org 回复过,但我想我应该在这里添加一个答案,以防其他人在搜索时找到此页面.

您可以将向量转换为稀疏对角矩阵,然后使用矩阵乘法(带 *)来做与广播相同的事情,但效率更高.

>>>d = ssp.lil_matrix((3,3))>>>d.setdiag(np.ones(3)*3)>>>广告<5x3 稀疏矩阵的类型 '<type 'numpy.float64'>'具有压缩稀疏行格式的 2 个存储元素>>>>(a*d).todense()矩阵([[ 0., 0., 0.],[ 0., 0., -3.],[ 0., 0., 0.],[ 0., 0., 0.],[ 0., 6., 0.]])

希望有帮助!

Suppose I have a 2d sparse array. In my real usecase both the number of rows and columns are much bigger (say 20000 and 50000) hence it cannot fit in memory when a dense representation is used:

>>> import numpy as np
>>> import scipy.sparse as ssp

>>> a = ssp.lil_matrix((5, 3))
>>> a[1, 2] = -1
>>> a[4, 1] = 2
>>> a.todense()
matrix([[ 0.,  0.,  0.],
        [ 0.,  0., -1.],
        [ 0.,  0.,  0.],
        [ 0.,  0.,  0.],
        [ 0.,  2.,  0.]])

Now suppose I have a dense 1d array with all non-zeros components with size 3 (or 50000 in my real life case):

>>> d = np.ones(3) * 3
>>> d
array([ 3.,  3.,  3.])

I would like to compute the elementwise multiplication of a and d using the usual broadcasting semantics of numpy. However, sparse matrices in scipy are of the np.matrix: the '*' operator is overloaded to have it behave like a matrix-multiply instead of the elementwise-multiply:

>>> a * d
array([ 0., -3.,  0.,  0.,  6.])

One solution would be to make 'a' switch to the array semantics for the '*' operator, that would give the expected result:

>>> a.toarray() * d
array([[ 0.,  0.,  0.],
       [ 0.,  0., -3.],
       [ 0.,  0.,  0.],
       [ 0.,  0.,  0.],
       [ 0.,  6.,  0.]])

But I cannot do that since the call to toarray() would materialize the dense version of 'a' which does not fit in memory (and the result will be dense too):

>>> ssp.issparse(a.toarray())
False

Any idea how to build this while keeping only sparse datastructures and without having to do a unefficient python loop on the columns of 'a'?

解决方案

I replied over at scipy.org as well, but I thought I should add an answer here, in case others find this page when searching.

You can turn the vector into a sparse diagonal matrix and then use matrix multiplication (with *) to do the same thing as broadcasting, but efficiently.

>>> d = ssp.lil_matrix((3,3))
>>> d.setdiag(np.ones(3)*3)
>>> a*d
<5x3 sparse matrix of type '<type 'numpy.float64'>'
 with 2 stored elements in Compressed Sparse Row format>
>>> (a*d).todense()
matrix([[ 0.,  0.,  0.],
        [ 0.,  0., -3.],
        [ 0.,  0.,  0.],
        [ 0.,  0.,  0.],
        [ 0.,  6.,  0.]])

Hope that helps!

这篇关于如何将 scipy.sparse 矩阵元素乘以广播的密集一维数组?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-25 06:07