问题描述
import logging
logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s', level=logging.INFO)
from gensim import corpora, models, similarities
from nltk.corpus import stopwords
import codecs
documents = []
with codecs.open("Master_File_for_Docs.txt", encoding = 'utf-8', mode= "r") as fid:
for line in fid:
documents.append(line)
stoplist = []
x = stopwords.words('english')
for word in x:
stoplist.append(word)
#Removes Stopwords
texts = [[word for word in document.lower().split() if word not in stoplist]
for document in documents]
dictionary = corpora.Dictionary(texts)
corpus = [dictionary.doc2bow(text) for text in texts]
lda = models.LdaModel(corpus, id2word=dictionary, num_topics=100)
lda.print_topics(20)
#corpus_lda = lda[corpus]
#for doc in corpus_lda:
# print(doc)
我正在运行Gensim进行主题建模,并尝试使上述代码正常工作.我知道这段代码行得通,因为我的朋友从Mac计算机上运行了该代码,并且运行成功,但是当我从Windows计算机上运行该代码时,我得到了
I am running Gensim for topic modeling and trying to get the above code working. I know that this code works because my friend ran it from a mac computer and it worked successfully but when I run it from a windows computer the code gives me a
MemoryError
我在第二行中设置的日志记录也没有出现在Windows计算机上.为了让gensim工作,Windows中是否需要修复某些东西?
Also the logging that I set on the second line also doesn't appear on my windows computer. Is there something in Windows that I need to fix in order for gensim to work?
推荐答案
出现 MemoryError
是因为Gensim在分析数据时会尝试将所需的所有数据保留在内存中.解决方案很简单:
The MemoryError
appears because Gensim is trying to keep all of the data you need in memory while analyzing it.The solutions are scarse:
- 使用具有更多内存的服务器(AWS计算机,比您的PC更强大的功能)
- 尝试使用64位python解释器
- 尝试减小
model.save()
中的size
参数.这样会减少代表您的单词的功能 - 尝试增加
model.save()
中的min_count
参数.这将使模型只考虑出现至少min_count
次 的单词
- Use a server with more memory (AWS machine, anything more powerful than your PC)
- Try a python interpreter in 64 bit
- Try reducing the
size
parameter inmodel.save()
. This will lead to have less features representing your words - Try increasing the
min_count
parameter inmodel.save()
. This will make the model consider only words that appear at leastmin_count
times
请注意,这最后两个建议会修改模型的特征
Be careful though, these last 2 advices will modify the characteristics of your model
这篇关于Python:Gensim内存错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!