问题描述
我正在使用Mallet api从Twitter数据中提取主题,并且我已经提取的主题似乎是不错的话题.但是我面临问题估计K.
I am using Mallet api to extract topic from twitter data and I have alreadyextracted topics which are seems good topic. But I am facing problemto estimating K.
例如,我将K值从10固定为100.因此,我从数据中选取了不同数量的主题.但是,现在我想估计哪个K最好.我知道有一些算法
For example I fixed K value from 10 to 100.So, I have taken different number of topics from the data.But, now I would like to estimate which K is best.There are some algorithm I know as
- 困惑
- 经验似然
- 边际似然法(调和均值法)
- 剪影
我发现了一个方法model.estimate(),可用于用不同的K值进行估计.但是我没有任何想法表明K的值最适合该模型.有人提供一些示例代码吗?谢谢.
I found a method model.estimate() which may be used to estimate with different value of K.But I am not getting any idea to show the value of K is best for the model.Does anyone give some idea about it with some sample code? Thanks.
推荐答案
我认为最好的算法是人为判断.创建具有不同数量主题的主题模型,然后对其进行研究并接受您喜欢的主题.有时,您希望微调主题的数量(例如,您不希望将某个主题拆分为两个,或者希望某个主题不在其中而不合并到另一个主题中.)
I think the best algorithm is human judgement. Create topic models with different numbers of topics and look at them and take what you like. Sometimes you want to fine tune the number of topics (Say, you don't want a certain topic to be split into two, or you want a certain topic to be there and not merged into another one).
这篇关于如何使用Mallet评估LDA的最佳K?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!