根据文档,我可以像这样在nltk中加载一个带有感官标记的语料库:
>>> from nltk.corpus import wordnet_ic
>>> brown_ic = wordnet_ic.ic('ic-brown.dat')
>>> semcor_ic = wordnet_ic.ic('ic-semcor.dat')
我也可以这样获得
definition
,pos
,offset
和examples
:>>> wn.synset('dog.n.01').examples
>>> wn.synset('dog.n.01').definition
但是,如何从语料库中获得同义词集的频率呢? 要分解问题:
最佳答案
我设法做到了这一点。
from nltk.corpus import wordnet as wn
word = "dog"
synsets = wn.synsets(word)
sense2freq = {}
for s in synsets:
freq = 0
for lemma in s.lemmas:
freq+=lemma.count()
sense2freq[s.offset+"-"+s.pos] = freq
for s in sense2freq:
print s, sense2freq[s]
关于python - 如何获取NLTK中的同义词集的词网感知频率?,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/15551195/