r - R中的情感分析

我是情绪分析领域的新手，完全不知道如何使用R进行分析。因此，我想在此方面寻求帮助和指导。

我有一组由意见组成的数据，并希望分析这些意见。

Title      Date            Content
Boy        May 13 2015     "She is pretty", Tom said.
Animal     June 14 2015    The penguin is cute, lion added.
Human      March 09 2015   Mr Koh predicted that every human is smart..
Monster    Jan 22 2015     Ms May, a student, said that John has $10.80.

谢谢你。

最佳答案

情感分析涵盖了广泛的方法类别，这些方法旨在测量文本中的正面情绪与负面情绪，因此这是一个很难回答的相当困难的问题。但这是一个简单的答案:您可以将字典应用于文档术语矩阵，然后将字典的正键和负键类别组合起来以创建情感测度。

我建议在文本分析软件包quanteda 中尝试此操作，该软件包可处理各种现有的字典格式，并允许您创建非常灵活的自定义词典。

例如:

require(quanteda)
mycorpus <- subset(inaugCorpus, Year>1980)
mydict <- dictionary(list(negative = c("detriment*", "bad*", "awful*", "terrib*", "horribl*"),
                          postive = c("good", "great", "super*", "excellent")))
myDfm <- dfm(mycorpus, dictionary = mydict)
## Creating a dfm from a corpus ...
##    ... lowercasing
##    ... tokenizing
##    ... indexing documents: 9 documents
##    ... indexing features: 3,113 feature types
##    ... applying a dictionary consisting of 2 keys
##    ... created a 9 x 2 sparse dfm
##    ... complete.
## Elapsed time: 0.057 seconds.
myDfm
## Document-feature matrix of: 9 documents, 2 features.
## 9 x 2 sparse Matrix of class "dfmSparse"
##               features
## docs           negative postive
##   1981-Reagan         0       6
##   1985-Reagan         0       6
##   1989-Bush           0      18
##   1993-Clinton        1       2
##   1997-Clinton        2       8
##   2001-Bush           1       6
##   2005-Bush           0       8
##   2009-Obama          2       3
##   2013-Obama          1       3

# use a LIWC dictionary - obviously you need this file
liwcdict <- dictionary(file = "LIWC2001_English.dic", format = "LIWC")
myDfmLIWC <- dfm(mycorpus, dictionary = liwcdict)
## Creating a dfm from a corpus ...
##    ... lowercasing
##    ... tokenizing
##    ... indexing documents: 9 documents
##    ... indexing features: 3,113 feature types
##    ... applying a dictionary consisting of 68 keys
##    ... created a 9 x 68 sparse dfm
##    ... complete.
## Elapsed time: 1.844 seconds.
myDfmLIWC[, grep("^Pos|^Neg", features(myDfmLIWC))]
## Document-feature matrix of: 9 documents, 4 features.
## 9 x 4 sparse Matrix of class "dfmSparse"
##               features
## docs           Negate Posemo Posfeel Negemo
##   1981-Reagan      46     89       5     24
##   1985-Reagan      28    104       7     33
##   1989-Bush        40    102      10      8
##   1993-Clinton     25     51       3     23
##   1997-Clinton     27     64       5     22
##   2001-Bush        40     80       6     27
##   2005-Bush        25    117       5     31
##   2009-Obama       40     83       5     46
##   2013-Obama       42     80      13     22

对于您的语料库，假设您将其放入名为data的data.frame中，则可以使用以下方法创建一个Quanteda语料库:

mycorpus <- corpus(data$Content, docvars = data[, 1:2])

另请参阅?textfile，以一个简单的命令从文件中加载内容。例如，这适用于.csv文件，尽管该文件可能会出现问题，因为“内容”字段包含包含逗号的文本。

当然，还有许多其他测量情感的方法，但是，如果您不熟悉情感挖掘和R，那应该会让您入门。您可以从以下位置阅读更多关于情感挖掘方法(如果已经道歉的话)的信息:

Liu, Bing. 2010. "Sentiment Analysis and Subjectivity." Handbook ofnatural language processing 2:627–66.

刘兵。 2015年。《情感分析:观点，情感和情感的挖掘》。剑桥大学出版社。

关于r - R中的情感分析，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/32598912/