本文介绍了 pandas 数据帧内存python的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我想将一个稀疏矩阵(156060x11780)转换为数据帧,但是我收到内存错误这是我的代码
i want to transform a sparse matrix (156060x11780) to dataframe but i get a memory error this is my code
vect = TfidfVectorizer(sublinear_tf=True, analyzer='word',
stop_words='english' , tokenizer=tokenize,
strip_accents = 'ascii')
X = vect.fit_transform(df.pop('Phrase')).toarray()
for i, col in enumerate(vect.get_feature_names()):
df[col] = X[:, i]
我在中有问题X = vect.fit_transform(df.pop ('Phrase'))toarray()
。如何解决?
推荐答案
尝试这样:
from sklearn.feature_extraction.text import TfidfVectorizer
vect = TfidfVectorizer(sublinear_tf=True, analyzer='word', stop_words='english',
tokenizer=tokenize,
strip_accents='ascii',dtype=np.float16)
X = vect.fit_transform(df.pop('Phrase')) # NOTE: `.toarray()` was removed
for i, col in enumerate(vect.get_feature_names()):
df[col] = pd.SparseSeries(X[:, i].toarray().reshape(-1,), fill_value=0)
这篇关于 pandas 数据帧内存python的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!