问题描述
/Users/Barry/anaconda/lib/python2.7/site-packages/gensim/models/ldaseqmodel.py:217:RuntimeWarning:除以double_scalars中的零收敛= np.fabs((bound-old_bound)/old_bound)
/Users/Barry/anaconda/lib/python2.7/site-packages/gensim/models/ldaseqmodel.py:217: RuntimeWarning: divide by zero encountered in double_scalars convergence = np.fabs((bound - old_bound) / old_bound)
#dynamic topic model
def run_dtm(num_topics=18):
docs, years, titles = preprocessing(datasetType=2)
#resort document by years
Z = zip(years, docs)
Z = sorted(Z, reverse=False)
years_new, docs_new = zip(*Z)
#generate time slice
time_slice = Counter(years_new).values()
for year in Counter(years_new):
print year,' --- ',Counter(years_new)[year]
print '********* data set loaded ********'
dictionary = corpora.Dictionary(docs_new)
corpus = [dictionary.doc2bow(text) for text in docs_new]
print '********* train lda seq model ********'
ldaseq = ldaseqmodel.LdaSeqModel(corpus=corpus, id2word=dictionary, time_slice=time_slice, num_topics=num_topics)
print '********* lda seq model done ********'
ldaseq.print_topics(time=1)
大家好,我正在使用gensim包中的动态主题模型进行主题分析,并按照本教程 https://github.com/RaRe-Technologies/gensim/blob/develop/docs/notebooks/ldaseqmodel.ipynb ,但是我总是得到同样的意外错误.谁能给我一些指导?甚至以为我尝试了一些用于生成语料库和字典的不同数据集,我都感到非常困惑.错误是这样的:
Hey guys, I'm using the dynamic topic models in gensim package for topic analysis, following this tutorial, https://github.com/RaRe-Technologies/gensim/blob/develop/docs/notebooks/ldaseqmodel.ipynb, however I always got the same unexpected error. Can anyone give me some guidance? I'm really puzzled even thought I have tried some different dataset for generating corpus and dictionary.The error is like this:
/Users/Barry/anaconda/lib/python2.7/site-packages/gensim/models/ldaseqmodel.py:217:RuntimeWarning:除以double_scalars中的零收敛= np.fabs((bound-old_bound)/old_bound)
/Users/Barry/anaconda/lib/python2.7/site-packages/gensim/models/ldaseqmodel.py:217: RuntimeWarning: divide by zero encountered in double_scalars convergence = np.fabs((bound - old_bound) / old_bound)
推荐答案
这是 ldaseqmodel.py 本身的源代码存在的问题.对于最新的 gensim 软件包(版本3.8.3 ),我在第293行遇到相同的错误:
This is the issue with the source code of ldaseqmodel.py itself.For the latest gensim package(version 3.8.3) I am getting the same error at line 293:
ldaseqmodel.py:293: RuntimeWarning: divide by zero encountered in double_scalars
convergence = np.fabs((bound - old_bound) / old_bound)
现在,如果您遍历代码,您将看到以下内容:在此处输入图片描述
Now, if you go through the code you will see this:enter image description here
您可以看到,这里他们将 bound 和 old_bound 之间的差除以 old_bound (从警告中也可以看到)
You can see that here they divide the difference between bound and old_bound by the old_bound(which is also visible from the warning)
现在,如果您进一步分析,您将在第263行看到 old_bound 初始化为 zero ,这是收到此除以零遇到的情况.
Now if you analyze further you will see that at line 263, the old_bound is initialized with zero and this is the main reason that you are getting this warning of divide by zero encountered.
有关更多信息,我在第294行放置了打印声明:
For further information, I put a print statement at line 294:
print('bound = {}, old_bound = {}'.format(bound, old_bound))
我收到的输出是:在此处输入图像描述
因此,在一行中,您收到此警告是由于软件包 ldaseqmodel.py 的源代码,而不是因为有任何空文档.尽管如果您不从语料库中删除空文档,您将收到另一个警告.因此,我建议您的语料库中是否有任何空文档,请将其删除,而忽略上述被零除的警告.
So, in a single line you are getting this warning because of the source code of the package ldaseqmodel.py not because of any empty document. Although if you do not remove the empty documents from your corpus you will receive another warning. So I suggest if there are any empty documents in your corpus remove them and just ignore the above warning of division by zero.
这篇关于gensim/models/ldaseqmodel.py:217:RuntimeWarning:在double_scalars中除以零的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!