本文介绍了NLTK的Vader评分文字示例的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我希望有人能纠正我对VADER对文本评分方式的理解.我已经在此处阅读了有关此过程的说明,但是我无法匹配该化合物重新创建它所描述的过程时,测试句子对Vader输出的评分.可以说我们有这样的句子:

I would like someone to correct my understanding of how VADER scores text. I've read an explanation of this process here, however I cannot match the compound score of test sentences to Vader's output when recreating the process it describes. Lets say we have the sentence:

"I like using VADER, its a fun tool to use"

VADER捡起的单词是喜欢"(+1.5分)和有趣"(+2.3分).根据文档,将这些值相加(等于+3.8),然后使用以下函数将其标准化为0到1之间的范围:

The words VADER picks up are 'like' (+1.5 score), and 'fun' (+2.3). According to the documentation, these values are summed (so +3.8), and then normalized to a range between 0 and 1 using the following function:

(alpha = 15)
x / x2 + alpha 

有了我们的电话号码,它应该变成:

With our numbers, this should become:

3.8 / 14.44 + 15 = 0.1290

但是,VADER输出的复合分数如下:

VADER, however, outputs the returned compound score as follows:

Scores: {'neg': 0.0, 'neu': 0.508, 'pos': 0.492, 'compound': 0.7003}

我的推理哪里出错了? 类似问题已经问了好几个但是,尚未提供VADER分类的实际示例.任何帮助将不胜感激.

Where am I going wrong in my reasoning? Similar questions have been asked several times, however an actual example of VADER classifying has not yet been provided. Any help would be appreciated.

推荐答案

只是您的规范化是错误的.从代码定义函数:

It's just your normalization that is wrong. From the code the function is defined:

def normalize(score, alpha=15):
"""
Normalize the score to be between -1 and 1 using an alpha that
approximates the max expected value
"""
norm_score = score/math.sqrt((score*score) + alpha)
return norm_score

所以您有3.8/sqrt(3.8 * 3.8 + 15)= 0.7003

So you have 3.8/sqrt(3.8*3.8 + 15) = 0.7003

这篇关于NLTK的Vader评分文字示例的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-18 16:46