问题描述
我已经使用textmineR包在R中创建了LDA主题模型,如下所示.
I've made a LDA topic model in R, using the textmineR package, it looks as follows.
## get textmineR dtm
dtm2 <- CreateDtm(doc_vec = dat2$fulltext, # character vector of documents
ngram_window = c(1, 2),
doc_names = dat2$names,
stopword_vec = c(stopwords::stopwords("da"), custom_stopwords),
lower = T, # lowercase - this is the default value
remove_punctuation = T, # punctuation - this is the default
remove_numbers = T, # numbers - this is the default
verbose = T,
cpus = 4)
dtm2 <- dtm2[, colSums(dtm2) > 2]
dtm2 <- dtm2[, str_length(colnames(dtm2)) > 2]
############################################################
## RUN & EXAMINE TOPIC MODEL
############################################################
# Draw quasi-random sample from the pc
set.seed(34838)
model2 <- FitLdaModel(dtm = dtm2,
k = 8,
iterations = 500,
burnin = 200,
alpha = 0.1,
beta = 0.05,
optimize_alpha = TRUE,
calc_likelihood = TRUE,
calc_coherence = TRUE,
calc_r2 = TRUE,
cpus = 4)
然后的问题是:1.我应该使用哪个函数来获取textmineR软件包中的困惑度分数?我似乎找不到一个.
2.如何衡量不同主题数(k)的复杂度得分?
The questions are then:1. Which function should i apply to get the perplexity scores in the textmineR package? I can't seem to find one.
2. how do i measure complexity scores for different numbers of topics(k)?
推荐答案
所要求的:除非您自己明确编程,否则无法使用textmineR
计算困惑. TBH,我从来没有见过用可能性和连贯性无法获得的困惑的价值,所以我没有实现它.
As asked: there's no way to calculate perplexity with textmineR
unless you explicitly program it yourself. TBH, I've never seen value of perplexity that you couldn't get with likelihood and coherence, so I didn't implement it.
但是,text2vec
软件包确实有一个实现.参见以下示例:
However, the text2vec
package does have an implementation. See below for example:
library(textmineR)
# model ships with textmineR as example
m <- nih_sample_topic_model
# dtm ships with textmineR as example
d <- nih_sample_dtm
# get perplexity
p <- text2vec::perplexity(X = d,
topic_word_distribution = m$phi,
doc_topic_distribution = m$theta)
这篇关于在R中使用textmineR软件包制作的LDA模型上,如何测量困惑度评分?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!