问题描述
我正在尝试比较在语义上相关的术语/表达-这些不是完整的句子,不一定是单个单词;例如-
I'm trying to compare terms/expressions which would (or not) be semantically related - these are not full sentences, and not necessarily single words; e.g. -
社交网络服务"和社交网络"显然密切相关,但是我如何使用nltk对此进行量化?
'Social networking service' and 'Social network' are clearly strongly related, but how to i quantify this using nltk?
很明显,我甚至缺少一些代码:
Clearly i'm missing something as even the code:
w1 = wordnet.synsets('social network')
返回一个空列表.
有关如何解决此问题的任何建议?
Any advice on how to tackle this?
推荐答案
存在一些语义相关性或相似性的度量,但最好将它们定义为wordnet词典中的单个单词或单个表达式-而不是wordnet词典条目的复合词据我所知.
There are some measures of semantic relatedness or similarity, but they're better defined for single words or single expressions in wordnet's lexicon - not for compounds of wordnet's lexical entries, as far as I know.
这是一个很好的Web实现,其中包含许多基于wordnet的相似度度量
This is a nice web implementation of many similarity wordnet-based measures
如果您有兴趣,还可以阅读一些有关使用词网相似性(尽管不评估化合物的相似性)来解释化合物的文章:
Some further reading on interpreting compounds using wordnet similarity (although not evaluating similarity on compounds), if you're interested:
- CiteSeerX (citations are clearer)
- Same article, PDF
这篇关于使用NLTK比较术语/表达式的相似性?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!