本文介绍了如何从Wordnet标准化相似性度量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试计算两个单词之间的语义相似度.我正在使用基于Wordnet的相似性度量,即Resnik度量(RES),Lin度量(LIN),Jiang和Conrath度量(JNC)以及Banerjee和Pederson度量(BNP).

I am trying to calculate semantic similarity between two words. I am using Wordnet-based similarity measures i.e Resnik measure(RES), Lin measure(LIN), Jiang and Conrath measure(JNC) and Banerjee and Pederson measure(BNP).

为此,我正在使用nltk和Wordnet 3.0.接下来,我要合并从不同度量获得的相似性值.为此,我需要对相似性值进行归一化,因为某些度量给出的值介于0和1之间,而另一些度量给出的值则大于1.

To do that, I am using nltk and Wordnet 3.0. Next, I want to combine the similarity values obtained from different measure. To do that i need to normalize the similarity values as some measure give values between 0 and 1, while others give values greater than 1.

所以,我的问题是如何标准化从不同度量获得的相似度值.

So, my question is how do I normalize the similarity values obtained from different measures.

更多细节:我实际上想做的事情:我有一些字眼.我计算单词之间的成对相似度.并删除与集合中其他单词没有强烈关联的单词.

Extra detail on what I am actually trying to do: I have a set of words. I calculate pairwise similarity between the words. and remove the words that are not strongly correlated with other words in the set.

推荐答案

如何规范单个度量

让我们考虑一个任意的相似度度量M并取一个任意的词w.

How to normalize a single measure

Let's consider a single arbitrary similarity measure M and take an arbitrary word w.

定义m = M(w,w).然后,m取最大可能值M.

Define m = M(w,w). Then m takes maximum possible value of M.

让我们将MN定义为标准化度量M.

Let's define MN as a normalized measure M.

对于任何两个单词w, u,您都可以计算MN(w, u) = M(w, u) / m.

For any two words w, u you can compute MN(w, u) = M(w, u) / m.

很容易看到,如果M采用非负值,那么MN采用[0, 1]中的值.

It's easy to see that if M takes non-negative values, then MN takes values in [0, 1].

为了计算自己定义的量度F,将k个不同的量度m_1, m_2, ..., m_k组合在一起,首先使用上述方法分别对每个m_i进行标准化,然后定义:

In order to compute your own defined measure F combined of k different measures m_1, m_2, ..., m_k first normalize independently each m_i using above method and then define:

alpha_1, alpha_2, ..., alpha_k

,使得alpha_i表示第i个小节的权重.

such that alpha_i denotes the weight of i-th measure.

所有字母的总和必须等于1,即:

All alphas must sum up to 1, i.e:

alpha_1 + alpha_2 + ... + alpha_k = 1

然后为w, u计算自己的度量:

Then to compute your own measure for w, u you do:

F(w, u) = alpha_1 * m_1(w, u) + alpha_2 * m_2(w, u) + ... + alpha_k * m_k(w, u)

很明显,F取[0,1]中的值

It's clear that F takes values in [0,1]

这篇关于如何从Wordnet标准化相似性度量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-19 04:43