问题描述
使用pytables
时,(据我所知)不支持scipy.sparse
矩阵格式,因此要存储矩阵,我必须进行一些转换,例如
When using pytables
, there's no support (as far as I can tell) for the scipy.sparse
matrix formats, so to store a matrix I have to do some conversion, e.g.
def store_sparse_matrix(self):
grp1 = self.getFileHandle().createGroup(self.getGroup(), 'M')
self.getFileHandle().createArray(grp1, 'data', M.tocsr().data)
self.getFileHandle().createArray(grp1, 'indptr', M.tocsr().indptr)
self.getFileHandle().createArray(grp1, 'indices', M.tocsr().indices)
def get_sparse_matrix(self):
return sparse.csr_matrix((self.getGroup().M.data, self.getGroup().M.indices, self.getGroup().M.indptr))
麻烦之处在于get_sparse
函数需要一些时间(从磁盘读取),如果我正确理解它,还需要将数据放入内存中.
The trouble is that the get_sparse
function takes some time (reading from disk), and if I understand it correctly also requires the data to fit into memory.
唯一的其他选择似乎是将矩阵转换为密集格式(numpy array
),然后正常使用pytables
.但是,这似乎效率很低,尽管我认为pytables
可能会处理压缩本身?
The only other option seems to convert the matrix to dense format (numpy array
) and then use pytables
normally. However this seems to be rather inefficient, although I suppose perhaps pytables
will deal with the compression itself?
推荐答案
从将numpy稀疏矩阵存储在HDF5(PyTables)中,您可以使用data
,indicies
和indptr
属性将scipy.sparse
数组编组为pytables格式,这是三个常规的numpy.ndarray
对象.
Borrowing from Storing numpy sparse matrix in HDF5 (PyTables), you can marshal a scipy.sparse
array into a pytables format using its data
, indicies
, and indptr
attributes, which are three regular numpy.ndarray
objects.
这篇关于使用pytables效率更高:scipy.sparse还是numpy密集矩阵?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!