我想要一个python库函数,可跨语音的不同部分进行翻译/转换。有时它应该输出多个单词(例如“coder”和“code”都是动词“to code”中的名词,一个是主语,另一个是宾语)

# :: String => List of String
print verbify('writer') # => ['write']
print nounize('written') # => ['writer']
print adjectivate('write') # => ['written']

我主要关心动词名词,我想编写一个记笔记程序。即,我可以写“咖啡因拮抗A1”或“咖啡因是A1拮抗剂”,并且通过一些NLP可以弄清楚它们的含义相同。 (我知道这并不容易,而且需要NLP进行解析,而不仅仅是标记,但我想破解一个原型(prototype))。

类似的问题...
Converting adjectives and adverbs to their noun forms
(此答案仅源于根POS。我想在POS之间进行操作。)

ps在语言学中称为转换http://en.wikipedia.org/wiki/Conversion_%28linguistics%29

最佳答案

这是一种启发式方法。我刚刚对其进行了编码,以便为样式使用代码。它使用来自wordnet的derivationally_related_forms()。我已经实现名词化。我猜verbify工作类似。根据我的测试,效果很好:

from nltk.corpus import wordnet as wn

def nounify(verb_word):
    """ Transform a verb to the closest noun: die -> death """
    verb_synsets = wn.synsets(verb_word, pos="v")

    # Word not found
    if not verb_synsets:
        return []

    # Get all verb lemmas of the word
    verb_lemmas = [l for s in verb_synsets \
                   for l in s.lemmas if s.name.split('.')[1] == 'v']

    # Get related forms
    derivationally_related_forms = [(l, l.derivationally_related_forms()) \
                                    for l in    verb_lemmas]

    # filter only the nouns
    related_noun_lemmas = [l for drf in derivationally_related_forms \
                           for l in drf[1] if l.synset.name.split('.')[1] == 'n']

    # Extract the words from the lemmas
    words = [l.name for l in related_noun_lemmas]
    len_words = len(words)

    # Build the result in the form of a list containing tuples (word, probability)
    result = [(w, float(words.count(w))/len_words) for w in set(words)]
    result.sort(key=lambda w: -w[1])

    # return all the possibilities sorted by probability
    return result

关于python - 在动词/名词/形容词形式之间转换单词,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/14489309/

10-12 16:48