问题描述
我已成功检索到通过其他语义关系连接到基本同义词集的同义词集,如下所示:
wn.synset('good.a.01').also_sees()出[63]:[Synset('best.a.01'),Synset('better.a.01'),Synset('favorable.a.01'),Synset('good.a.03'),Synset('obedient.a.01'),Synset('可敬的.a.01')]wn.synset('good.a.01').similar_tos()出[64]:[Synset('bang-up.s.01'),Synset('good_enough.s.01'),Synset('goodish.s.01'),Synset('hot.s.15'),Synset('redeeming.s.02'),Synset('satisfactory.s.02'),Synset('solid.s.01'),Synset('superb.s.02'),Synset('乖巧.s.01')]
然而,反义词关系似乎不同.我设法检索到连接到我的基本同义词集的引理,但无法检索实际同义词集,如下所示:
wn.synset('good.a.01').lemmas()[0].antonyms()出[67]:[引理('bad.a.01.bad')]
我怎样才能得到同义词集,而不是引理,它通过反义词连接到我的基本同义词集 - wn.synset('good.a.01') ?TIA
出于某种原因,WordNet 在引理级别而不是 Synset 索引 antonymy
关系(请参阅 http://wordnetweb.princeton.edu/perl/webwn?o2=&o0=1&o81&o7=&o5=&o9=&o6=&o3=&o4=&s=good&i=8&h=00001000000000000000000000000000#c),所以问题是Synsets
和 Lemmas
具有多对多或一对一的关系.
在词义二义性的情况下,我们在String-to-Synset
之间是一对多的关系,例如
在一个含义/概念、多重表示的情况下,我们在 Synset
-to-String(其中 String 指的是引理名称)之间存在一对多关系:
注意:到目前为止,我们比较的是 String 和 Synsets
之间的关系,而不是 Lemmas
和 Synsets
.
可爱"之处在于 Lemma
和 String 是一对一的关系:
Lemma
对象的 _name
属性返回一个 unicode 字符串,而不是一个列表.从代码点:https://github.com/nltk/nltk/blob/develop/nltk/corpus/reader/wordnet.py#L202 和 https://github.com/nltk/nltk/blob/develop/nltk/corpus/reader/wordnet.py#L444>
似乎引理与 Synset 是一对一的关系.来自 https://github.com 的文档字符串/nltk/nltk/blob/develop/nltk/corpus/reader/wordnet.py#L220:
引理属性,可通过同名方法访问::
- name:此引理的规范名称.
- synset:这个引理所属的同义词集.
- syntactic_marker:对于形容词,WordNet 字符串标识句法位置相对修饰名词.看:http://wordnet.princeton.edu/man/wninput.5WN.html#第10节对于所有其他词性,此属性为 None.
- count:这个词在 wordnet 中的出现频率.
所以我们可以这样做并且以某种方式知道每个 Lemma
对象只会返回 1 个同义词集:
假设您正在尝试进行一些情感分析并且您需要 WordNet 中每个形容词的反义词,您可以轻松地这样做以接受反义词的同义词:
>>>从 nltk.corpus 导入 wordnet as wn>>>all_adj_in_wn = wn.all_synsets(pos='a')>>>def get_antonyms(ss):... return set(chain(*[[a.synset() for a in l.antonyms()] for l in ss.lemmas()]))...>>>对于 all_adj_in_wn 中的 ss:... 打印 ss, ':', get_antonyms(ss)...Synset('unable.a.01') : set([Synset('unable.a.01')])I have successfully retrieved synsets connected to a base synset via other semantic relations, as follows:
wn.synset('good.a.01').also_sees()
Out[63]:
[Synset('best.a.01'),
Synset('better.a.01'),
Synset('favorable.a.01'),
Synset('good.a.03'),
Synset('obedient.a.01'),
Synset('respectable.a.01')]
wn.synset('good.a.01').similar_tos()
Out[64]:
[Synset('bang-up.s.01'),
Synset('good_enough.s.01'),
Synset('goodish.s.01'),
Synset('hot.s.15'),
Synset('redeeming.s.02'),
Synset('satisfactory.s.02'),
Synset('solid.s.01'),
Synset('superb.s.02'),
Synset('well-behaved.s.01')]
However, the antonym relation seems different. I managed to retrieve the lemma connected to my base synset, but was not able to retrieve the actual synset, like so:
wn.synset('good.a.01').lemmas()[0].antonyms()
Out[67]: [Lemma('bad.a.01.bad')]
How can I get the synset, and not the lemma, that is connected via antonymy to my base synset - wn.synset('good.a.01') ? TIA
For some reason, WordNet indexes antonymy
relations at the Lemma level instead of the Synset (see http://wordnetweb.princeton.edu/perl/webwn?o2=&o0=1&o8=1&o1=1&o7=&o5=&o9=&o6=&o3=&o4=&s=good&i=8&h=00001000000000000000000000000000#c), so the question is whether Synsets
and Lemmas
have many-to-many or one-to-one relations.
In the case of ambiguous words, one word many meaning, we have a one-to-many relation between String-to-Synset
, e.g.
>>> wn.synsets('dog')
[Synset('dog.n.01'), Synset('frump.n.01'), Synset('dog.n.03'), Synset('cad.n.01'), Synset('frank.n.02'), Synset('pawl.n.01'), Synset('andiron.n.01'), Synset('chase.v.01')]
In the case of one meaning/concept, multiple representation, we have a one-to-many relation between Synset
-to-String (where String refers to Lemma names):
>>> dog = wn.synset('dog.n.1')
>>> dog.definition()
u'a member of the genus Canis (probably descended from the common wolf) that has been domesticated by man since prehistoric times; occurs in many breeds'
>>> dog.lemma_names()
[u'dog', u'domestic_dog', u'Canis_familiaris']
Note: up till now, we are comparing the relationships between String and Synsets
not Lemmas
and Synsets
.
The "cute" thing is that Lemma
and String has a one-to-one relationship:
>>> wn.synsets('dog')
[Synset('dog.n.01'), Synset('frump.n.01'), Synset('dog.n.03'), Synset('cad.n.01'), Synset('frank.n.02'), Synset('pawl.n.01'), Synset('andiron.n.01'), Synset('chase.v.01')]
>>> wn.synsets('dog')[0]
Synset('dog.n.01')
>>> wn.synsets('dog')[0].definition()
u'a member of the genus Canis (probably descended from the common wolf) that has been domesticated by man since prehistoric times; occurs in many breeds'
>>> wn.synsets('dog')[0].lemmas()
[Lemma('dog.n.01.dog'), Lemma('dog.n.01.domestic_dog'), Lemma('dog.n.01.Canis_familiaris')]
>>> wn.synsets('dog')[0].lemmas()[0]
Lemma('dog.n.01.dog')
>>> wn.synsets('dog')[0].lemmas()[0].name()
u'dog'
The _name
property of a Lemma
object returns a unicode string, not a list. From the code points: https://github.com/nltk/nltk/blob/develop/nltk/corpus/reader/wordnet.py#L202 and https://github.com/nltk/nltk/blob/develop/nltk/corpus/reader/wordnet.py#L444
And it seems like the Lemma has a one-to-one relation with Synset. From docstring at https://github.com/nltk/nltk/blob/develop/nltk/corpus/reader/wordnet.py#L220:
So we can do this and somehow know that each Lemma
object is only going to return us 1 synset:
>>> wn.synsets('dog')[0].lemmas()[0]
Lemma('dog.n.01.dog')
>>> wn.synsets('dog')[0].lemmas()[0].synset()
Synset('dog.n.01')
Assuming that you are trying to do some sentiment analysis and you need the antonyms of every adjective in WordNet, you can easily do this to accept the Synsets of the antonyms:
>>> from nltk.corpus import wordnet as wn
>>> all_adj_in_wn = wn.all_synsets(pos='a')
>>> def get_antonyms(ss):
... return set(chain(*[[a.synset() for a in l.antonyms()] for l in ss.lemmas()]))
...
>>> for ss in all_adj_in_wn:
... print ss, ':', get_antonyms(ss)
...
Synset('unable.a.01') : set([Synset('unable.a.01')])
这篇关于如何在 NLTK 的 Wordnet 中检索目标同义词集的反义词同义词集?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!