问题描述
标题几乎概括了这个问题.我注意到,在某些论文中,人们提到了NER的BILOU编码方案,而不是典型的BIO标记方案(例如Ratinov和Roth在2009年发表的这篇论文 http://cogcomp.cs.illinois.edu/page/publication_view/199 )
Title pretty much sums up the question. I've noticed that in some papers people have referred to a BILOU encoding scheme for NER as opposed to the typical BIO tagging scheme (Such as this paper by Ratinov and Roth in 2009 http://cogcomp.cs.illinois.edu/page/publication_view/199)
通过使用2003 CoNLL数据,我知道
From working with the 2003 CoNLL data I know that
B stands for 'beginning' (signifies beginning of an NE)
I stands for 'inside' (signifies that the word is inside an NE)
O stands for 'outside' (signifies that the word is just a regular word outside of an NE)
虽然有人告诉我BILOU中的单词代表
While I've been told that the words in BILOU stand for
B - 'beginning'
I - 'inside'
L - 'last'
O - 'outside'
U - 'unit'
我也看到人们引用另一个标签
I've also seen people reference another tag
E - 'end', use it concurrently with the 'last' tag
S - 'singleton', use it concurrently with the 'unit' tag
我对NER文献还很陌生,但是我一直找不到能明确解释这些标签的内容.我的问题特别涉及"last"标签和"end"标签之间的区别是什么,以及"unit"标签代表什么.
I'm pretty new to the NER literature, but I've been unable to find something clearly explaining these tags. My questions in particular relates to what the difference between 'last' and 'end' tags are, and what 'unit' tag stands for.
推荐答案
基于问题和补丁在清除TK中,似乎BILOU代表"多令牌块的开始,内部和最后令牌,单位长度块和外部"(添加了重点).例如,用方括号表示的分块
Based on an issue and a patch in Clear TK, it seems like BILOU stands for "Beginning, Inside and Last tokens of multi-token chunks, Unit-length chunks and Outside" (emphasis added). For instance, the chunking denoted by brackets
(foo foo foo) (bar) no no no (bar bar)
可以用BILOU编码为
can be encoded with BILOU as
B-foo, I-foo, L-foo, U-bar, O, O, O, B-bar, L-bar
这篇关于BILOU标签在命名实体识别中是什么意思?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!