本文介绍了如何在 NLTK 的 Wordnet 中检索目标同义词集的反义词同义词集?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已成功检索到通过其他语义关系连接到基本同义词集的同义词集,如下所示:

 wn.synset('good.a.01').also_sees()出[63]:[Synset('best.a.01'),Synset('better.a.01'),Synset('favorable.a.01'),Synset('good.a.03'),Synset('obedient.a.01'),Synset('可敬的.a.01')]wn.synset('good.a.01').similar_tos()出[64]:[Synset('bang-up.s.01'),Synset('good_enough.s.01'),Synset('goodish.s.01'),Synset('hot.s.15'),Synset('redeeming.s.02'),Synset('satisfactory.s.02'),Synset('solid.s.01'),Synset('superb.s.02'),Synset('乖巧.s.01')]

然而,反义词关系似乎不同.我设法检索到连接到我的基本同义词集的引理,但无法检索实际同义词集,如下所示:

wn.synset('good.a.01').lemmas()[0].antonyms()出[67]:[引理('bad.a.01.bad')]

我怎样才能得到同义词集,而不是引理,它通过反义词连接到我的基本同义词集 - wn.synset('good.a.01') ?TIA

解决方案

出于某种原因,WordNet 在引理级别而不是 Synset 索引 antonymy 关系(请参阅 http://wordnetweb.princeton.edu/perl/webwn?o2=&o0=1&o81&o7=&o5=&o9=&o6=&o3=&o4=&s=good&i=8&h=00001000000000000000000000000000#c),所以问题是SynsetsLemmas 具有多对多或一对一的关系.

在词义二义性的情况下,我们在String-to-Synset之间是一对多的关系,例如

>>>wn.synsets('狗')[Synset('dog.n.01'), Synset('frump.n.01'), Synset('dog.n.03'), Synset('cad.n.01'), Synset('frank.n.02'), Synset('pawl.n.01'), Synset('andiron.n.01'), Synset('chase.v.01')]

在一个含义/概念、多重表示的情况下,我们在 Synset-to-String(其中 String 指的是引理名称)之间存在一对多关系:

>>>dog = wn.synset('dog.n.1')>>>狗.定义()u'a 属 Canis 的成员(可能是普通狼的后裔),自史前时代就被人类驯化;发生在许多品种'>>>dog.lemma_names()[u'dog', u'domestic_dog', u'Canis_familiaris']

注意:到目前为止,我们比较的是 String 和 Synsets 之间的关系,而不是 LemmasSynsets.

可爱"之处在于 Lemma 和 String 是一对一的关系:

>>>wn.synsets('狗')[Synset('dog.n.01'), Synset('frump.n.01'), Synset('dog.n.03'), Synset('cad.n.01'), Synset('frank.n.02'), Synset('pawl.n.01'), Synset('andiron.n.01'), Synset('chase.v.01')]>>>wn.synsets('狗')[0]Synset('dog.n.01')>>>wn.synsets('dog')[0].definition()u'a 属 Canis 的成员(可能是普通狼的后裔),自史前时代就被人类驯化;发生在许多品种'>>>wn.synsets('dog')[0].lemmas()[引理('dog.n.01.dog'),引理('dog.n.01.domestic_dog'),引理('dog.n.01.Canis_familiaris')]>>>wn.synsets('dog')[0].lemmas()[0]引理('dog.n.01.dog')>>>wn.synsets('dog')[0].lemmas()[0].name()你狗

Lemma 对象的 _name 属性返回一个 unicode 字符串,而不是一个列表.从代码点:https://github.com/nltk/nltk/blob/develop/nltk/corpus/reader/wordnet.py#L202https://github.com/nltk/nltk/blob/develop/nltk/corpus/reader/wordnet.py#L444>

似乎引理与 Synset 是一对一的关系.来自 https://github.com 的文档字符串/nltk/nltk/blob/develop/nltk/corpus/reader/wordnet.py#L220:

引理属性,可通过同名方法访问::

  • name:此引理的规范名称.
  • synset:这个引理所属的同义词集.
  • syntactic_marker:对于形容词,WordNet 字符串标识句法位置相对修饰名词.看:http://wordnet.princeton.edu/man/wninput.5WN.html#第10节对于所有其他词性,此属性为 None.
  • count:这个词在 wordnet 中的出现频率.

所以我们可以这样做并且以某种方式知道每个 Lemma 对象只会返回 1 个同义词集:

>>>wn.synsets('dog')[0].lemmas()[0]引理('dog.n.01.dog')>>>wn.synsets('dog')[0].lemmas()[0].synset()Synset('dog.n.01')

假设您正在尝试进行一些情感分析并且您需要 WordNet 中每个形容词的反义词,您可以轻松地这样做以接受反义词的同义词:

>>>从 nltk.corpus 导入 wordnet as wn>>>all_adj_in_wn = wn.all_synsets(pos='a')>>>def get_antonyms(ss):... return set(chain(*[[a.synset() for a in l.antonyms()] for l in ss.lemmas()]))...>>>对于 all_adj_in_wn 中的 ss:... 打印 ss, ':', get_antonyms(ss)...Synset('unable.a.01') : set([Synset('unable.a.01')])

I have successfully retrieved synsets connected to a base synset via other semantic relations, as follows:

 wn.synset('good.a.01').also_sees()
 Out[63]:
 [Synset('best.a.01'),
 Synset('better.a.01'),
 Synset('favorable.a.01'),
 Synset('good.a.03'),
 Synset('obedient.a.01'),
 Synset('respectable.a.01')]

wn.synset('good.a.01').similar_tos()
Out[64]:
[Synset('bang-up.s.01'),
 Synset('good_enough.s.01'),
 Synset('goodish.s.01'),
 Synset('hot.s.15'),
 Synset('redeeming.s.02'),
 Synset('satisfactory.s.02'),
 Synset('solid.s.01'),
 Synset('superb.s.02'),
 Synset('well-behaved.s.01')]

However, the antonym relation seems different. I managed to retrieve the lemma connected to my base synset, but was not able to retrieve the actual synset, like so:

wn.synset('good.a.01').lemmas()[0].antonyms()
Out[67]: [Lemma('bad.a.01.bad')]

How can I get the synset, and not the lemma, that is connected via antonymy to my base synset - wn.synset('good.a.01') ? TIA

解决方案

For some reason, WordNet indexes antonymy relations at the Lemma level instead of the Synset (see http://wordnetweb.princeton.edu/perl/webwn?o2=&o0=1&o8=1&o1=1&o7=&o5=&o9=&o6=&o3=&o4=&s=good&i=8&h=00001000000000000000000000000000#c), so the question is whether Synsets and Lemmas have many-to-many or one-to-one relations.


In the case of ambiguous words, one word many meaning, we have a one-to-many relation between String-to-Synset, e.g.

>>> wn.synsets('dog')
[Synset('dog.n.01'), Synset('frump.n.01'), Synset('dog.n.03'), Synset('cad.n.01'), Synset('frank.n.02'), Synset('pawl.n.01'), Synset('andiron.n.01'), Synset('chase.v.01')]

In the case of one meaning/concept, multiple representation, we have a one-to-many relation between Synset-to-String (where String refers to Lemma names):

>>> dog = wn.synset('dog.n.1')
>>> dog.definition()
u'a member of the genus Canis (probably descended from the common wolf) that has been domesticated by man since prehistoric times; occurs in many breeds'
>>> dog.lemma_names()
[u'dog', u'domestic_dog', u'Canis_familiaris']

Note: up till now, we are comparing the relationships between String and Synsets not Lemmas and Synsets.


The "cute" thing is that Lemma and String has a one-to-one relationship:

>>> wn.synsets('dog')
[Synset('dog.n.01'), Synset('frump.n.01'), Synset('dog.n.03'), Synset('cad.n.01'), Synset('frank.n.02'), Synset('pawl.n.01'), Synset('andiron.n.01'), Synset('chase.v.01')]
>>> wn.synsets('dog')[0]
Synset('dog.n.01')
>>> wn.synsets('dog')[0].definition()
u'a member of the genus Canis (probably descended from the common wolf) that has been domesticated by man since prehistoric times; occurs in many breeds'
>>> wn.synsets('dog')[0].lemmas()
[Lemma('dog.n.01.dog'), Lemma('dog.n.01.domestic_dog'), Lemma('dog.n.01.Canis_familiaris')]
>>> wn.synsets('dog')[0].lemmas()[0]
Lemma('dog.n.01.dog')
>>> wn.synsets('dog')[0].lemmas()[0].name()
u'dog'

The _name property of a Lemma object returns a unicode string, not a list. From the code points: https://github.com/nltk/nltk/blob/develop/nltk/corpus/reader/wordnet.py#L202 and https://github.com/nltk/nltk/blob/develop/nltk/corpus/reader/wordnet.py#L444

And it seems like the Lemma has a one-to-one relation with Synset. From docstring at https://github.com/nltk/nltk/blob/develop/nltk/corpus/reader/wordnet.py#L220:

So we can do this and somehow know that each Lemma object is only going to return us 1 synset:

>>> wn.synsets('dog')[0].lemmas()[0]
Lemma('dog.n.01.dog')
>>> wn.synsets('dog')[0].lemmas()[0].synset()
Synset('dog.n.01')


Assuming that you are trying to do some sentiment analysis and you need the antonyms of every adjective in WordNet, you can easily do this to accept the Synsets of the antonyms:

>>> from nltk.corpus import wordnet as wn
>>> all_adj_in_wn = wn.all_synsets(pos='a')
>>> def get_antonyms(ss):
...     return set(chain(*[[a.synset() for a in l.antonyms()] for l in ss.lemmas()]))
...
>>> for ss in all_adj_in_wn:
...     print ss, ':', get_antonyms(ss)
...
Synset('unable.a.01') : set([Synset('unable.a.01')])

这篇关于如何在 NLTK 的 Wordnet 中检索目标同义词集的反义词同义词集?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

06-16 15:15