问题描述
简短版本:
如果我有词干:Say 'comput' for 'computing', or 'sugari' for 'sugary'
有没有办法构造最接近的名词形式?That is 'computer', or 'sugar' respectively
Short version:
If I have a stemmed word:Say 'comput' for 'computing', or 'sugari' for 'sugary'
Is there a way to construct it's closest noun form?That is 'computer', or 'sugar' respectively
长版:
我正在使用python和NLTK,Wordnet在一堆单词上执行一些语义相似性任务.
我注意到大多数sem-sim分数仅对名词有效,而形容词和动词则没有任何结果.
了解了所涉及的不准确性之后,我想将单词从动词/形容词形式转换为名词形式,以便我可以估算它们的相似性(而不是通常由形容词返回的"NONE").
Longer version:
I'm using python and NLTK, Wordnet to perform a few semantic similarity tasks on a bunch of words.
I noticed that most sem-sim scores work well only for nouns, while adjectives and verbs don't give any results.
Understanding the inaccuracies involved, I wanted to convert a word from its verb/adjective form to its noun form, so I may get an estimate of their similarity (instead of the 'NONE' that normally gets returned with adjectives).
我认为实现此目的的一种方法是使用词干提取词根,然后尝试构建该词根最接近的名词形式.
此处的George-Bogdan Ivanov算法非常有效.我想尝试其他方法.有没有更好的方法将单词从形容词/动词形式转换为名词形式?
I thought one way to do this would be to use a stemmer to get at the root word, and then try to construct the closest noun form of that root.
George-Bogdan Ivanov's algorithm from here works pretty well. I wanted to try alternative approaches. Is there any better way to convert a word from adjective/verb form to noun form?
推荐答案
首先从wordnet
同义词集中提取所有可能的候选者.然后使用difflib
将字符串与目标词干进行比较.
First extract all the possible candidates from wordnet
synsets.Then use difflib
to compare the strings against your target stem.
>>> from nltk.corpus import wordnet as wn
>>> from itertools import chain
>>> from difflib import get_close_matches as gcm
>>> target = "comput"
>>> candidates = set(chain(*[ss.lemma_names for ss in wn.all_synsets('n') if len([i for i in ss.lemma_names if target in i]) > 0]))
>>> gcm(target,candidates)[0]
一种更容易理解的候选人计算方式:
A more human readable way to compute the candidates is as such:
candidates = set()
for ss in wn.all_synsets('n'):
for ln in ss.lemma_names: # get all possible lemmas for this synset.
for lemma in ln:
if target in lemma:
candidates.add(target)
这篇关于从词干中获取最接近的名词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!