本文介绍了CESS_ESP标签的定义的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 NLTK CESS ESP数据包,并且已经能够使用自适应意大利面条标记器HiddenMarkovModelTagger用来对句子进行位置标记的方式它产生的标记根本不像标记en_US句子时使用的标记,这是指向的链接对NLTK进行分类和标记文档,您会注意到所使用的标记是大写字母,没有任何数字或标点符号,一些必需标记:vsip3s0da0fs0.

I'm using the NLTK CESS ESP data package and I've been able to use an adatpationof the spaghetti tagger and a HiddenMarkovModelTagger to pos-tag the sentence, how ever the tags that it produces are not at all like the ones used when tagging en_US sentences, here's a link to the Categorizing and Tagging documentation for NLTK, you'll notice that the tags used are uppercase and don't have any numbers or punctuation, some cess tags: vsip3s0, da0fs0.

有人知道一个解释这些标签的参考书吗?

Does some one know a reference that explains those tags?

意大利面条匕首

[('\xc2\xbfQue', None), ('es', 'vsip3s0'), ('la', 'da0fs0'), ('programaci\xc3\xb3n', None), ('orientada', 'aq0fsp'), ('a', 'sps00'), ('objetos', 'ncmp000'), ('?', 'Fit')]
[('\xc2\xbfQue', None), ('es', None), ('la', None), ('programaci\xc3\xb3n', None), ('orientada', None), ('a', None), ('objetos', None), ('?', None)]
[('\xc2\xbfQue', None), ('es', 'vsip3s0'), ('la', 'da0fs0'), ('programaci\xc3\xb3n', None), ('orientada', 'aq0fsp'), ('a', 'sps00'), ('objetos', 'ncmp000'), ('?', 'Fit')]
[('\xc2\xbfQue', None), ('es', 'vsip3s0'), ('la', 'da0fs0'), ('programaci\xc3\xb3n', None), ('orientada', 'aq0fsp'), ('a', 'sps00'), ('objetos', 'ncmp000'), ('?', 'Fit')]

Markov Tagger

[('\xc2\xbfQue', 'sn.e-SUJ'), ('es', 'vsip3s0'), ('la', 'da0fs0'), ('programaci\xc3\xb3n', 'ncfs000'), ('orientada', 'aq0fsp'), ('a', 'sps00'), ('objetos', 'ncmp000'), ('?', 'Fit')]

推荐答案

cess-esp语料库是使用名为 EAGLE 的旧注释系统标记的,您可以看到它此处.希望这会有所帮助.

The cess-esp corpus is tagged using an old annotation system named EAGLE you can see it here. Hope this helps.

这篇关于CESS_ESP标签的定义的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-29 09:32