问题描述
我使用mallet创建了一个并行主题模型。
I've made a parallel topic model using mallet.
我想为每个文档获得顶级单词。
And I want to get top-words for each document.
为此,我试图获得一个单词主题概率矩阵。
To do that, I'm trying to get a word-topic probability matrix.
我将如何实现这一目标?
How would I achieve this?
推荐答案
使用mallet构建主题时,您有一个名为--word-topic-counts-file的选项。当您提供此选项并指定文件时,mallet会为文件中的每一行写入(主题,单词,概率)值。您可以稍后用C,Java或R(当然,任何语言)读取此文件,以创建所需的矩阵。
When you are building topics using mallet, you have an option called "--word-topic-counts-file". When you give this option and specify a file, mallet writes ( topic, word, probability ) values per each line in the file. You can later read this file in C,Java or R (ofcourse, any language) to create the matrix you want.
这篇关于如何使用mallet获得单词主题概率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!