问题描述
我正在使用 NLTK CESS ESP数据包,并且已经能够使用自适应意大利面条标记器和HiddenMarkovModelTagger
用来对句子进行位置标记的方式它产生的标记根本不像标记en_US句子时使用的标记,这是指向的链接对NLTK进行分类和标记文档,您会注意到所使用的标记是大写字母,没有任何数字或标点符号,一些必需标记:vsip3s0
,da0fs0
.
I'm using the NLTK CESS ESP data package and I've been able to use an adatpationof the spaghetti tagger and a HiddenMarkovModelTagger
to pos-tag the sentence, how ever the tags that it produces are not at all like the ones used when tagging en_US sentences, here's a link to the Categorizing and Tagging documentation for NLTK, you'll notice that the tags used are uppercase and don't have any numbers or punctuation, some cess tags: vsip3s0
, da0fs0
.
有人知道一个解释这些标签的参考书吗?
Does some one know a reference that explains those tags?
意大利面条匕首
[('\xc2\xbfQue', None), ('es', 'vsip3s0'), ('la', 'da0fs0'), ('programaci\xc3\xb3n', None), ('orientada', 'aq0fsp'), ('a', 'sps00'), ('objetos', 'ncmp000'), ('?', 'Fit')]
[('\xc2\xbfQue', None), ('es', None), ('la', None), ('programaci\xc3\xb3n', None), ('orientada', None), ('a', None), ('objetos', None), ('?', None)]
[('\xc2\xbfQue', None), ('es', 'vsip3s0'), ('la', 'da0fs0'), ('programaci\xc3\xb3n', None), ('orientada', 'aq0fsp'), ('a', 'sps00'), ('objetos', 'ncmp000'), ('?', 'Fit')]
[('\xc2\xbfQue', None), ('es', 'vsip3s0'), ('la', 'da0fs0'), ('programaci\xc3\xb3n', None), ('orientada', 'aq0fsp'), ('a', 'sps00'), ('objetos', 'ncmp000'), ('?', 'Fit')]
Markov Tagger
[('\xc2\xbfQue', 'sn.e-SUJ'), ('es', 'vsip3s0'), ('la', 'da0fs0'), ('programaci\xc3\xb3n', 'ncfs000'), ('orientada', 'aq0fsp'), ('a', 'sps00'), ('objetos', 'ncmp000'), ('?', 'Fit')]
推荐答案
cess-esp
语料库是使用名为 EAGLE 的旧注释系统标记的,您可以看到它此处.希望这会有所帮助.
The cess-esp
corpus is tagged using an old annotation system named EAGLE you can see it here. Hope this helps.
这篇关于CESS_ESP标签的定义的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!