问题描述
有没有一种方法可以从 pandas.SparseDataFrame
转换为 scipy.sparse.csr_matrix
,而无需在内存中生成密集矩阵?
Is there a way to convert from a pandas.SparseDataFrame
to scipy.sparse.csr_matrix
, without generating a dense matrix in memory?
scipy.sparse.csr_matrix(df.values)
不起作用,因为它生成一个密集矩阵,该矩阵被转换为 csr_matrix
.
doesn't work as it generates a dense matrix which is cast to the csr_matrix
.
提前致谢!
推荐答案
Pandas 文档讨论了到 scipy 稀疏的实验性转换,SparseSeries.to_coo:
Pandas docs talks about an experimental conversion to scipy sparse, SparseSeries.to_coo:
http://pandas-docs.github.io/pandas-docs-travis/sparse.html#interaction-with-scipy-sparse
================
================
edit - 这是来自多索引的特殊功能,而不是数据框.请参阅其他答案.请注意日期的差异.
edit - this is a special function from a multiindex, not a data frame. See the other answers for that. Note the difference in dates.
============
============
从 0.20.0 开始,有一个 sdf.to_coo()
和一个多索引 ss.to_coo()
.由于稀疏矩阵本质上是 2d 的,因此(有效)1d 数据序列需要多索引是有意义的.而数据框可以表示表格或二维数组.
As of 0.20.0, there is a sdf.to_coo()
and a multiindex ss.to_coo()
. Since a sparse matrix is inherently 2d, it makes sense to require multiindex for the (effectively) 1d dataseries. While the dataframe can represent a table or 2d array.
当我第一次回答这个问题时,这个稀疏数据帧/系列功能是实验性的(2015 年 6 月).
When I first responded to this question this sparse dataframe/series feature was experimental (june 2015).
这篇关于Pandas sparse dataFrame转稀疏矩阵,内存中不生成稠密矩阵的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!