在LDA模型中,这是使用现有模型来推断新文档的两种方法。这两种方法有什么区别?
最佳答案
我做了一些测试,其中ldamodel有8个主题,下面是我的结果:
2个预测主题的文档:
list_unseenTw=[['hope', 'miley', 'blow', 'peopl', 'mind', 'tonight', 'gain', 'million', 'fan'],['@mileycyrustour', "we'r", 'think', "it'", 'pretti', 'cool', 'miley', 'saturday', 'night', 'live', 'tonight', '#prettycool']]
用lda[doc_bow]预测(它已经给出了匹配主题的百分比)
doc_bow=[字典.doc2bow(文本)用于列表中未登录的文本]
预测=ldamodel[doc_bow]
预测[0]:
[(0,0.02509002728802024),
(1,0.0250114373070437),
(2,0.025040162139306051),
(3,0.82462688228515812),
(4,0.025150924341817767),
(5,0.025000027675139792),
(6,0.025000024127660267),
(7,0.025080514835853926)]
预测[1]:
[(0,0.031250011319462589),
(1,0.03125001371820222),
(2,0.031250019639505598),
(3,0.031250015093378707),
(4,0.031250019670816337),
(5,0.031250024860739675),
(6号,0.78124988084026048),
(7,0.031250014854016454)]
用ldamodel.inference进行预测(结果以权重而不是百分比表示)
pred=ldamodel.推断(doc_bow)
打印(pred)
(数组([[0.12545023,0.1250572,0.12520085,4.12309694,0.12579184,0.12500014,0.12500012,0.12540268],
[0.12500005、0.12500005、0.12500008、0.12500006、0.12500008、0.125000001、3.12499952、0.12500006]],无)
如您所见,第一个预测(doc1)的结果与您所做的相同(主题3):
total=0
for i in pred[0][0]:
total+=i
4.12309694/total = 0.82462%
关于python - lda [doc_bow]和lda.inference(corpus)之间有什么区别?,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/27145452/