machine-learning - 使用WordNet将特定单词泛化为高阶概念

WordNet是否具有“高阶”概念？如何为给定的单词生成它们？

我有序言“事实”形式的数据语料库。我想归纳为概念性的组成部分，即'contains'('oranges', 'vitamin c').和'contains'('spinach','iron').将归纳为'contains'(<food>, <nutrient>).

我对WordNet不太了解，所以我在想的一件事就是生成所有可能的上位词，然后组合拟定所有可能的新规则，但这是一种“强力”方法。

WordNet是否存储例如之类的高级概念？这样做可能会更容易，因为这样我就可以使用该特定变量的高阶概念创建一个新规则，假设WordNet中有一个规则，而如果我采用蛮力方式，则可能只有五十或一百个规则。

因此，我真正想知道的是：是否有一个命令为给定“事实”中的三个组件中的每个组件生成高阶概念？或者也许只是括号内的两个。如果退出该命令，那是什么？

以下是一些我正在参考的数据。

'be'('mr jiang', 'representing china').
'be'('hrh', 'britain').
'be more than'('# distinguished guests', 'the principal representatives').
'end with'('the playing of the british national anthem', 'hong kong').
'follow at'('the stroke of midnight', 'this').
'take part in'('the ceremony', 'both countries').
'start at about'('# pm', 'the ceremony').
'end about'('# am', 'the ceremony').
'lower'('the british hong kong flag', '# royal hong kong police officers').
'raise'('the sar flag', 'another #').
'leave for'('the royal yacht britannia', 'the #').
'hold by'('the chinese and british governments', 'the handover of hong kong').
'rise over'('this land', 'the regional flag of the hong kong special administrative region of the people \'s republic of china').
'cast eye on'('hong kong', 'the world').
'hold on'('schedule', 'the # governments').
'be festival for'('the chinese nation', 'this').
'go in'('the annals of history', 'july # , #').
'become master of'('this chinese land', 'the hong kong compatriots').
'enter era of'('development', 'hong kong').
'remember'('mr deng xiaoping', 'history').
'be along'('the course', 'it').
'resolve'('the hong kong question', 'we').
'wish to express thanks to'('all the personages', 'i').
'contribute to'('the settlement of the hong kong', 'both china and britain').
'support'('hong kong \'s return', 'the world').

最佳答案

Wordnet将高阶概念称为“ hypernyms”。例如，颜色“绿色”的上位词是“彩色”，因为绿色属于较高级的彩色。

应该注意的是，Wordnet区分“单词”（字符串）和“ sysnets”（我们与给定字符串关联的含义）。正如一个单词可以具有多种含义一样，一个字符串可以具有多个同义词集。如果要检索给定单词的所有高阶含义，则可以在Python中运行以下行：

from nltk import wordnet as wn

# If you are using nltk version 3.0.1, the following will tell you all the synsets for "green" and will thenn find all of their hypernyms. If you're running nltk 3.0.0, you can change the first line to `for synset in wn.synsets('bank'):
for synset in wn.wordnet.synsets('green'):
    for hypernym in synset.hypernyms():
        print synset, hypernym

关于machine-learning - 使用WordNet将特定单词泛化为高阶概念，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/28088226/