语义相关性算法-python | 语义相关性算法

本文介绍了语义相关性算法-python的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想找到两个同义集之间的相关性，并且遇到了很多算法，例如resnik，lin，wu palmer，path算法，leacock chodorow等.有人可以告诉我这些算法中哪一个最有效吗?

I want to find relatedness between two synsets and I came across many algorithms like resnik,lin,wu palmer,path algorithm,leacock chodorow etc. Can somebody tell me which one is most efficient among these algorithms?

推荐答案

从向我展示示例"的角度来看，下面的示例展示了如何使用语义相似性执行WSD:

From a "show me an example" perspective, here's an example to show how you can use semantic similarity to perform WSD:

from nltk.corpus import wordnet as wn
from nltk.tokenize import word_tokenize

def max_wupa(context_sentence, ambiguous_word):
  """
  WSD by Maximizing Wu-Palmer Similarity.

  Perform WSD by maximizing the sum of maximum Wu-Palmer score between possible
  synsets of all words in the context sentence and the possible synsets of the
  ambiguous words (see http://goo.gl/XMq2BI):
  {argmax}_{synset(a)}(\sum_{i}^{n}{{max}_{synset(i)}(Wu-Palmer(i,a))}

  Wu-Palmer (1994) similarity is based on path length; the similarity between
  two synsets accounts for the number of nodes along the shortest path between
  them. (see http://acl.ldc.upenn.edu/P/P94/P94-1019.pdf)
  """

  result = {}
  for i in wn.synsets(ambiguous_word):
    result[i] = sum(max([i.wup_similarity(k) for k in wn.synsets(j)]+[0]) \
                    for j in word_tokenize(context_sentence))
  result = sorted([(v,k) for k,v in result.items()],reverse=True)
  return result

bank_sents = ['I went to the bank to deposit my money',
'The river bank was full of dead fishes']
ans = max_wupa(bank_sents[0], 'bank')
print ans
print ans[0][1].definition

(来源: pyWSD @ github )

请谨慎使用以上代码，因为您需要考虑以下问题:

Use the above code with care because you need to consider:

当我们试图最大化上下文句子中所有标记的所有可能同义词与歧义词的可能同义词之间的路径相似性时，究竟发生了什么?
如果大多数路径相似性产生None，最大化是否合乎逻辑，并且偶然地您会得到一些流氓单词，该单词具有与歧义词的同义词之一相关的同义词?

what is really happening when we try to maximize path similarity between all possible synsets of all tokens in context sentence and the possible synsets of the ambiguous word?
is the maximization even logical if most of the path similarity yields None and by chance you get some rogue word that have a related synset to one of the synset of the ambiguous word?

这篇关于语义相关性算法-python的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！