本文介绍了Python:Gensim内存错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

import logging
logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s', level=logging.INFO)
from gensim import corpora, models, similarities
from nltk.corpus import stopwords
import codecs

documents = []
with codecs.open("Master_File_for_Docs.txt", encoding = 'utf-8', mode= "r") as fid:
   for line in fid:
       documents.append(line)
stoplist = []
x = stopwords.words('english')
for word in x:
    stoplist.append(word)

#Removes Stopwords
texts = [[word for word in document.lower().split() if word not in stoplist]
for document in documents]


dictionary = corpora.Dictionary(texts)
corpus = [dictionary.doc2bow(text) for text in texts]

lda = models.LdaModel(corpus, id2word=dictionary, num_topics=100)
lda.print_topics(20)
#corpus_lda = lda[corpus]
#for doc in corpus_lda:
 #   print(doc)

我正在运行Gensim进行主题建模,并尝试使上述代码正常工作.我知道这段代码行得通,因为我的朋友从Mac计算机上运行了该代码,并且运行成功,但是当我从Windows计算机上运行该代码时,我得到了

I am running Gensim for topic modeling and trying to get the above code working. I know that this code works because my friend ran it from a mac computer and it worked successfully but when I run it from a windows computer the code gives me a

MemoryError

我在第二行中设置的日志记录也没有出现在Windows计算机上.为了让gensim工作,Windows中是否需要修复某些东西?

Also the logging that I set on the second line also doesn't appear on my windows computer. Is there something in Windows that I need to fix in order for gensim to work?

推荐答案

出现 MemoryError 是因为Gensim在分析数据时会尝试将所需的所有数据保留在内存中.解决方案很简单:

The MemoryError appears because Gensim is trying to keep all of the data you need in memory while analyzing it.The solutions are scarse:

  • 使用具有更多内存的服务器(AWS计算机,比您的PC更强大的功能)
  • 尝试使用64位python解释器
  • 尝试减小 model.save()中的 size 参数.这样会减少代表您的单词的功能
  • 尝试增加 model.save()中的 min_count 参数.这将使模型只考虑出现至少 min_count
  • 的单词
  • Use a server with more memory (AWS machine, anything more powerful than your PC)
  • Try a python interpreter in 64 bit
  • Try reducing the size parameter in model.save(). This will lead to have less features representing your words
  • Try increasing the min_count parameter in model.save(). This will make the model consider only words that appear at least min_count times

请注意,这最后两个建议会修改模型的特征

Be careful though, these last 2 advices will modify the characteristics of your model

这篇关于Python:Gensim内存错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-04 05:17