改善Scipy稀疏矩阵的乘法性能

The sparse matrix multiplication routines are directly coded in C++, and as far as a quick look at the source reveals, there doesn't seem to be any hook to any optimized library. Furthermore, it doesn't seem to be taking advantage of the fact that the second matrix is a vector to minimize calculations. So you can probably speed things up quite a bit by accessing the guts of the sparse matrix, and customizing the multiplication algorithm. The following code does so in pure Python/Numpy, and if the vector really has "a few non-null points" it matches the speed of scipy's C++ code: if you implemented it in Cython, the speed increase should be noticeable:def sparse_col_vec_dot(csc_mat, csc_vec): # row numbers of vector non-zero entries v_rows = csc_vec.indices v_data = csc_vec.data # matrix description arrays m_dat = csc_mat.data m_ind = csc_mat.indices m_ptr = csc_mat.indptr # output arrays sizes = m_ptr.take(v_rows+1) - m_ptr.take(v_rows) sizes = np.concatenate(([0], np.cumsum(sizes))) data = np.empty((sizes[-1],), dtype=csc_mat.dtype) indices = np.empty((sizes[-1],), dtype=np.intp) indptr = np.zeros((2,), dtype=np.intp) for j in range(len(sizes)-1): slice_ = slice(*m_ptr[[v_rows[j] ,v_rows[j]+1]]) np.multiply(m_dat[slice_], v_data[j], out=data[sizes[j]:sizes[j+1]]) indices[sizes[j]:sizes[j+1]] = m_ind[slice_] indptr[-1] = len(data) ret = sps.csc_matrix((data, indices, indptr), shape=csc_vec.shape) ret.sum_duplicates() return ret发生的事情的简要说明:CSC矩阵定义为三个线性数组:A quick explanation of what is going on: a CSC matrix is defined in three linear arrays: data包含以列主顺序存储的非零条目. indices包含非零条目的行. indptr的条目比矩阵的列数多，并且j列的项目在data[indptr[j]:indptr[j+1]]中找到，并且在indices[indptr[j]:indptr[j+1]]行中.data contains the non-zero entries, stored in column major order.indices contains the rows of the non-zero entries.indptr has one entry more than the number of columns of the matrix, and items in column j are found in data[indptr[j]:indptr[j+1]] and are in rows indices[indptr[j]:indptr[j+1]].因此要乘以稀疏的列向量，可以对列向量的data和indices进行迭代，对于每个(d, r)对，提取矩阵的相应列，然后将其乘以d ，即data[indptr[r]:indptr[r+1]] * d和indices[indptr[r]:indptr[r+1]].So to multiply by a sparse column vector, you can iterate over data and indices of the column vector, and for each (d, r) pair, extract the corresponding column of the matrix and multiply it by d, i.e. data[indptr[r]:indptr[r+1]] * d and indices[indptr[r]:indptr[r+1]]. 这篇关于改善Scipy稀疏矩阵的乘法性能的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！上岸，阿里云！