我正在使用nltk和wordnet链接属于某个关系组的单词。例如,“停车场”和“建筑”应该有一些父链接。我用的是超名词,但有些词是没有联系的。
x = wordnet.synset('parking.n.01')
y = wordnet.synset('building.n.01')
print(x._shortest_hypernym_paths(y))
print(y._shortest_hypernym_paths(x))
{Synset('parking.n.01'):0,Synset('room.n.02'):1,
Synset('position.n.07'):2,Synset('relation.n.01'):3,
Synset('abstraction.n.06'):4,Synset('entity.n.01'):5,
Synset('ROOT'):6}{Synset('building.n.01'):0,
Synset('structure.n.01'):1,Synset('artifact.n.01'):2,
Synset('whole.n.02'):3,Synset('object.n.01'):4,
Synset('physical_entity.n.01'):5,Synset('entity.n.01'):6,
Synset('ROOT'):7}
在这里,连接通过'entity.n.01'进行,它实际上是几乎所有物理名词的根。我怎样才能得到比这更好的东西?
我想买一些类似“停车场”->“结构”->“建筑”的东西;它可以更长,但“外星人”的词不应该出现在上面,例如“猴子”也可以拉到实体。
最佳答案
找到了一些查看可能性的有用方法:
def getShortestHypernymPath(word1, word2, nulls=False):
syns1 = wordnet.synsets(word1)
syns2 = wordnet.synsets(word2)
for s1 in syns1:
for s2 in syns2:
lch = s2.lowest_common_hypernyms(s1)
if len(lch) > 0 or nulls:
print(s1, '<-->', s2, '===', lch)
nlpf.getShortestHypernymPath('parking', 'building', nulls=False)
这将返回:
Synset('parking.n.01')Synset('building.n.01')===
[Synset('entity.n.01')]Synset('parking.n.01')
Synset('construction.n.01')==[Synset('abstraction.n.06')]
Synset('parking.n.01')Synset('construction.n.07')===
[Synset('abstraction.n.06')]Synset('parking.n.01')
Synset('building.n.04')==[Synset('abstraction.n.06')]
Synset('parking.n.02')Synset('building.n.01')===
[Synset('entity.n.01')]Synset('parking.n.02')
Synset('construction.n.01')==[Synset('act.n.02')]
Synset('parking.n.02')Synset('construction.n.07')===
[Synset('act.n.02')]Synset('parking.n.02')
Synset('building.n.04')==[Synset('abstraction.n.06')]
Synset('park.v.02')Synset('build.v.05')===
[语法集('control.v.01')]
所以我至少可以调解一下。