我已经成功检索了通过其他语义关系连接到基本同义词集的同义词集,如下所示:
wn.synset('good.a.01').also_sees()
Out[63]:
[Synset('best.a.01'),
Synset('better.a.01'),
Synset('favorable.a.01'),
Synset('good.a.03'),
Synset('obedient.a.01'),
Synset('respectable.a.01')]
wn.synset('good.a.01').similar_tos()
Out[64]:
[Synset('bang-up.s.01'),
Synset('good_enough.s.01'),
Synset('goodish.s.01'),
Synset('hot.s.15'),
Synset('redeeming.s.02'),
Synset('satisfactory.s.02'),
Synset('solid.s.01'),
Synset('superb.s.02'),
Synset('well-behaved.s.01')]
但是,反义词关系似乎有所不同。我设法检索连接到基本同义词集的引理,但无法检索实际的同义词集,如下所示:
wn.synset('good.a.01').lemmas()[0].antonyms()
Out[67]: [Lemma('bad.a.01.bad')]
如何获得通过反义关系连接到我的基本同义词集wn.synset('good.a.01')的同义词集,而不是引理? TIA
最佳答案
由于某些原因,WordNet在引理级别而不是Synset上索引antonymy
关系(请参见http://wordnetweb.princeton.edu/perl/webwn?o2=&o0=1&o8=1&o1=1&o7=&o5=&o9=&o6=&o3=&o4=&s=good&i=8&h=00001000000000000000000000000000#c),因此问题是Synsets
和Lemmas
是多对多还是一对一关系。
对于含糊不清的单词(一个单词具有多种含义),我们在String-to- Synset
之间存在一对多关系,例如
>>> wn.synsets('dog')
[Synset('dog.n.01'), Synset('frump.n.01'), Synset('dog.n.03'), Synset('cad.n.01'), Synset('frank.n.02'), Synset('pawl.n.01'), Synset('andiron.n.01'), Synset('chase.v.01')]
在一种含义/概念,多种表示形式的情况下,我们在
Synset
-to-String(其中String表示引理名称)之间具有一对多关系:>>> dog = wn.synset('dog.n.1')
>>> dog.definition()
u'a member of the genus Canis (probably descended from the common wolf) that has been domesticated by man since prehistoric times; occurs in many breeds'
>>> dog.lemma_names()
[u'dog', u'domestic_dog', u'Canis_familiaris']
注意:到目前为止,我们正在比较String和
Synsets
而不是Lemmas
和Synsets
之间的关系。“可爱”的事情是
Lemma
和String具有一对一的关系:>>> wn.synsets('dog')
[Synset('dog.n.01'), Synset('frump.n.01'), Synset('dog.n.03'), Synset('cad.n.01'), Synset('frank.n.02'), Synset('pawl.n.01'), Synset('andiron.n.01'), Synset('chase.v.01')]
>>> wn.synsets('dog')[0]
Synset('dog.n.01')
>>> wn.synsets('dog')[0].definition()
u'a member of the genus Canis (probably descended from the common wolf) that has been domesticated by man since prehistoric times; occurs in many breeds'
>>> wn.synsets('dog')[0].lemmas()
[Lemma('dog.n.01.dog'), Lemma('dog.n.01.domestic_dog'), Lemma('dog.n.01.Canis_familiaris')]
>>> wn.synsets('dog')[0].lemmas()[0]
Lemma('dog.n.01.dog')
>>> wn.synsets('dog')[0].lemmas()[0].name()
u'dog'
_name
对象的Lemma
属性返回unicode字符串,而不是列表。从代码点:https://github.com/nltk/nltk/blob/develop/nltk/corpus/reader/wordnet.py#L202和https://github.com/nltk/nltk/blob/develop/nltk/corpus/reader/wordnet.py#L444看起来引理与Synset具有一对一的关系。来自https://github.com/nltk/nltk/blob/develop/nltk/corpus/reader/wordnet.py#L220的文档字符串:
引理属性,可通过具有相同名称的方法访问:
名称:此引理的规范名称。
synset:该引理所属的同义词集。
syntactic_marker:对于形容词,WordNet字符串标识
句法位置相对修饰名词。看到:
http://wordnet.princeton.edu/man/wninput.5WN.html#sect10
对于语音的所有其他部分,此属性为“无”。
count:这个词在Wordnet中的频率。
因此,我们可以执行此操作,并以某种方式知道每个
Lemma
对象只会返回1个同义词集:>>> wn.synsets('dog')[0].lemmas()[0]
Lemma('dog.n.01.dog')
>>> wn.synsets('dog')[0].lemmas()[0].synset()
Synset('dog.n.01')
假设您正在尝试进行情感分析,并且需要WordNet中每个形容词的反义词,则可以轻松地执行此操作以接受反义词的同义词集:
>>> from nltk.corpus import wordnet as wn
>>> all_adj_in_wn = wn.all_synsets(pos='a')
>>> def get_antonyms(ss):
... return set(chain(*[[a.synset() for a in l.antonyms()] for l in ss.lemmas()]))
...
>>> for ss in all_adj_in_wn:
... print ss, ':', get_antonyms(ss)
...
Synset('unable.a.01') : set([Synset('unable.a.01')])
关于python - 如何在NLTK的Wordnet中检索目标同义词集的反义词同义词集?,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/40957598/