问题描述
我正在寻找Java中的一个好的开源。这是我到目前为止所提出的。
I'm looking for a good open source POS Tagger in Java. Here's what I have come up with so far.
- LingPipe
- Stanford
- LBJ
- FastTag
有人有任何建议吗?
推荐答案
你在找吗?标记特定域中的POS?大多数通用标记符都是通过新闻专线文本进行培训的。通常,当您在特定域(例如生物医学文本)中使用它们时,它们表现不佳。还有其他专门为这些领域培训过的标记,例如非常好,是我推荐的。
For newswire text, Adwait Ratnaparkhi's MXPOST is very good and is the one I would recommend.
其他Java实现包括:
Other Java implementations include:
- (不是真正的POS标记器,但所有完整的解析器通常都包含POS标记器.Google用于 Java语法分析器,你会发现很多。)
- MontyLingua
- Berkeley Parser (Not really a POS tagger but all full blown parsers will typically include POS taggers. Google for Java syntactic parsers and you will find many.)
- QTag
- LBJ
和已发布通过其他海报也相当不错。
OpenNLP and Lingpipe as posted by the other posters are also pretty decent.
有关POS标签最新技术的信息可以找到。如您所见,(另一张海报也提到)排名最佳现在,但各种标签的变化并不多。我自己没有使用过LTAG。
Info on the state-of-the-art on POS tagging can be found here. As you can see LTAG-Spinal (also mentioned by another poster) ranks best as of now, but the variation across the various taggers is not much. I have not used LTAG myself.
另请注意,POS标记的基准性能约为90%。基线意味着 - (a)用词典中最频繁的POS标签标记每个单词,(b)将每个未知单词标记为名词。
Also note that the baseline performance for POS tagging is about 90%. Baseline means - (a) tag every word by most frequent POS tag from a lexicon, and (b) tag every unknown word as a noun.
这篇关于什么是用于词性标注的优秀Java库?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!