本文介绍了Pandas sparse dataFrame转稀疏矩阵,内存中不生成稠密矩阵的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有没有一种方法可以从 pandas.SparseDataFrame 转换为 scipy.sparse.csr_matrix,而无需在内存中生成密集矩阵?

Is there a way to convert from a pandas.SparseDataFrame to scipy.sparse.csr_matrix, without generating a dense matrix in memory?

scipy.sparse.csr_matrix(df.values)

不起作用,因为它生成一个密集矩阵,该矩阵被转换为 csr_matrix.

doesn't work as it generates a dense matrix which is cast to the csr_matrix.

提前致谢!

推荐答案

Pandas 文档讨论了到 scipy 稀疏的实验性转换,SparseSeries.to_coo:

Pandas docs talks about an experimental conversion to scipy sparse, SparseSeries.to_coo:

http://pandas-docs.github.io/pandas-docs-travis/sparse.html#interaction-with-scipy-sparse

================

================

edit - 这是来自多索引的特殊功能,而不是数据框.请参阅其他答案.请注意日期的差异.

edit - this is a special function from a multiindex, not a data frame. See the other answers for that. Note the difference in dates.

============

============

从 0.20.0 开始,有一个 sdf.to_coo() 和一个多索引 ss.to_coo().由于稀疏矩阵本质上是 2d 的,因此(有效)1d 数据序列需要多索引是有意义的.而数据框可以表示表格或二维数组.

As of 0.20.0, there is a sdf.to_coo() and a multiindex ss.to_coo(). Since a sparse matrix is inherently 2d, it makes sense to require multiindex for the (effectively) 1d dataseries. While the dataframe can represent a table or 2d array.

当我第一次回答这个问题时,这个稀疏数据帧/系列功能是实验性的(2015 年 6 月).

When I first responded to this question this sparse dataframe/series feature was experimental (june 2015).

这篇关于Pandas sparse dataFrame转稀疏矩阵,内存中不生成稠密矩阵的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-18 23:16