问题描述
我想使用命名实体识别(NER)为数据库中的文本找到足够的标签。我没有使用像NLTK或Lingpipe这样的工具,而是想构建自己的工具。
I would like to use named entity recognition (NER) to find adequate tags for texts in a database. Instead of using tools like NLTK or Lingpipe I want to build my own tool.
所以我的问题是:
-
我应该使用哪种算法?
Which algorithm should I use?
构建此工具有多难?
推荐答案
我在前一段时间研究马尔可夫链时做过这个。
I did this some time ago when I studied Markov chains.
无论如何,答案是:
斯坦福NLP例如使用条件随机场(CRF)。如果你不是想有效地做这件事,你就像Jackass 3d的家伙一样在风中撒尿
。解析人类语言没有简单的方法,因为它的构造很复杂,并且有很多例外。
Stanford NLP for example uses Conditional Random Field (CRF). If you are not trying to do this effectively, you are like dude from Jackass 3d who was pissing in the wind
. There is no simple way to parse human language, as it's construction is complex and it has tons of exceptions.
如果你知道自己在做什么,那就不那么难了。输入规则和逻辑的过程可能是烦人且耗时的,并且修复错误可能是非常重要的。但是在20年后,你可以为自己创造一些几乎有用的东西。
Well if you know what you are doing, then it's not that hard at all. The process of entering the rules and logic can be annoying and time consuming, and fixing bugs can be nontrivial. But in 20 years, you can make something almost useful (for yourself).
这篇关于使用Java命名实体识别的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!