本文介绍了打印词性以及单词的同义词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下代码,用于从输入文本文件中提取一个单词并使用 WordNet 打印该单词的同义词、定义和例句.它根据词性将同义词与同义词分开,即动词的同义词和形容词的同义词分开打印.

例如,flabbergasted 一词的同义词是 1) flabbergast 、boggle 、bowl over,它们是动词;2) 目瞪口呆、目瞪口呆、目瞪口呆、目瞪口呆、雷霆万钧、目瞪口呆、目瞪口呆,它们是形容词.

如何打印词性和同义词?我已经提供了到目前为止的代码:

导入 nltk从 nltk.corpus 导入 wordnet as wntokenizer = nltk.data.load('tokenizers/punkt/english.pickle')fp = open('sample.txt','r')数据 = fp.read()令牌= nltk.wordpunct_tokenize(数据)文本 = nltk.Text(令牌)words = [w.lower() for w in text]一句话:打印一个syns = wn.synsets(a)对于 s 在 s 中:打印打印定义:" s.definition打印同义词:"对于 s.lemmas 中的 l:打印 l.name打印示例:"对于 s.examples 中的 b:打印 b打印
解决方案

只需在同义词集上调用 pos().列出一个引理的所有 POS:

>>>从 nltk.corpus 导入 wordnet as wn>>>syns = wn.synsets('狗')>>>set([x.pos() for x in syn]){'n', 'v'}

不幸的是,除了源之外,这似乎没有记录在任何地方代码,显示了可以在同义词集上调用的其他方法.

Synset 属性,可通过同名方法访问:

  • name:此同义词集的规范名称,使用此同义词集的第一个引理形成.请注意,这可能与名称不同如果该字符串使用不同的引理 to
    ,则传递给构造函数识别同义词集.
  • pos:同义词集的词性,匹配模块级属性 ADJ、ADJ_SAT、ADV、NOUN 或 VERB 之一.
  • lemmas:此同义词集的引理对象列表.
  • definition:这个同义词集的定义.
  • examples:此同义词集的示例字符串列表.
  • offset:此同义词集的 WordNet dict 文件中的偏移量.
  • lexname:包含此同义词集的词典编纂者文件的名称.

I have the following code for taking a word from the input text file and printing the synonyms, definitions and example sentences for the word using WordNet. It separates the synonyms from the synset based on the part-of-speech, i.e., the synonyms that are verbs and the synonyms that are adjectives are printed separately.

Example for the word flabbergasted the synonyms are 1) flabbergast , boggle , bowl over which are verbs and 2)dumbfounded , dumfounded , flabbergasted , stupefied , thunderstruck , dumbstruck , dumbstricken which are adjectives.

How do I print the part-of-speech along with the synonyms? I have provided the code I have so far below:


import nltk
from nltk.corpus import wordnet as wn
tokenizer = nltk.data.load('tokenizers/punkt/english.pickle')
fp = open('sample.txt','r')
data = fp.read()
tokens= nltk.wordpunct_tokenize(data)
text = nltk.Text(tokens)
words = [w.lower() for w in text]
for a in words:
   print a 
syns = wn.synsets(a)
for s in syns:
   print 
   print "definition:" s.definition
   print "synonyms:"
   for l in s.lemmas:
      print l.name
   print "examples:"
   for b in s.examples:
      print b
   print 
解决方案

Simply call pos() on a synset. To list all the POS for a lemma:

>>> from nltk.corpus import wordnet as wn
>>> syns = wn.synsets('dog')
>>> set([x.pos() for x in syns])
{'n', 'v'}

Unfortunately this doesn't seem to be documented anywhere except the source code, which shows other methods that can be called on a synset.

这篇关于打印词性以及单词的同义词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-26 19:57