r - qdap ngram 极性字典

亲爱的 Stackoverlow 人群

我设法使用 qdap 极性函数来计算一些博客条目的极性，加载我自己的字典，基于sentiWS。现在我有一个新的情感词典( SePL )，它不仅包含单个单词，还包含短语。例如“简单好”，其中“简单”既不是否定也不是放大器，而是使它更精确。所以我想知道，我是否可以使用 qdap 的极性函数搜索 ngram。

举个例子:

library(qdap)
phrase <- "This is simply the best"
key <- sentiment_frame(c("simply", "best", "simply the best"), "", c(0.1,0.3,0.8))
counts(polarity(phrase, polarity.frame=key))

给出:

  all wc polarity    pos.words neg.words                text.var
1 all  5    0.179 simply, best         - This is simply the best

但是，我想得到如下输出:

  all wc polarity    pos.words neg.words                text.var
1 all  5    0.76 simply the best         - This is simply the best

任何人都知道如何让它像那样工作？

祝一切顺利，
本

最佳答案

这是今年早些时候重新引入 bag_o_word 函数的错误。这是第二次这样的错误影响 ngram 极性，因为我启用了 ngrams 在 Polarity.frame 中的使用:https://github.com/trinker/qdap/issues/185

我已经修复了这个错误并添加了一个单元测试以确保这个错误不会重新回到代码中。您在 qdap 2.2.1 中的代码现在提供了所需的输出，尽管针对算法初衷的警告仍然存在:

> library(qdap)
> phrase <- "This is simply the best"
> key <- sentiment_frame(c("simply", "best", "simply the best"), "", c(0.1,0.3,0.8))
> counts(polarity(phrase, polarity.frame=key))

  all wc polarity       pos.words neg.words                text.var
1 all  5    0.358 simply the best         - This is simply the best

qdap 的 polarity 函数使用的算法不是为这样操作而设计的。您可以使用以下 hack 来做到这一点，但要知道它超出了函数算法中使用的基础理论的意图:

library(qdap)
phrase <- "This is simply the best"

terms <- c("simply", "best", "simply the best")
key <- sentiment_frame(space_fill(terms, terms, sep="xxx"), NULL, c(0.1,0.3,0.8))

counts(polarity(space_fill(phrase, terms, "xxx"), polarity.frame=key))

##   all wc polarity           pos.words neg.words                    text.var
## 1 all  3    0.462 simplyxxxthexxxbest         - This is simplyxxxthexxxbest

关于r - qdap ngram 极性字典，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/27156834/