本文介绍了使用pytables效率更高:scipy.sparse还是numpy密集矩阵?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用pytables时,(据我所知)不支持scipy.sparse矩阵格式,因此要存储矩阵,我必须进行一些转换,例如

When using pytables, there's no support (as far as I can tell) for the scipy.sparse matrix formats, so to store a matrix I have to do some conversion, e.g.

def store_sparse_matrix(self):
    grp1 = self.getFileHandle().createGroup(self.getGroup(), 'M')
    self.getFileHandle().createArray(grp1, 'data', M.tocsr().data)
    self.getFileHandle().createArray(grp1, 'indptr', M.tocsr().indptr)
    self.getFileHandle().createArray(grp1, 'indices', M.tocsr().indices)

def get_sparse_matrix(self):
    return sparse.csr_matrix((self.getGroup().M.data, self.getGroup().M.indices, self.getGroup().M.indptr))

麻烦之处在于get_sparse函数需要一些时间(从磁盘读取),如果我正确理解它,还需要将数据放入内存中.

The trouble is that the get_sparse function takes some time (reading from disk), and if I understand it correctly also requires the data to fit into memory.

唯一的其他选择似乎是将矩阵转换为密集格式(numpy array),然后正常使用pytables.但是,这似乎效率很低,尽管我认为pytables可能会处理压缩本身?

The only other option seems to convert the matrix to dense format (numpy array) and then use pytables normally. However this seems to be rather inefficient, although I suppose perhaps pytables will deal with the compression itself?

推荐答案

将numpy稀疏矩阵存储在HDF5(PyTables)中,您可以使用dataindiciesindptr属性将scipy.sparse数组编组为pytables格式,这是三个常规的numpy.ndarray对象.

Borrowing from Storing numpy sparse matrix in HDF5 (PyTables), you can marshal a scipy.sparse array into a pytables format using its data, indicies, and indptr attributes, which are three regular numpy.ndarray objects.

这篇关于使用pytables效率更高:scipy.sparse还是numpy密集矩阵?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-18 23:17