问题描述
我们需要开发一个可以将全名解析为
We are in need of developing a back end application that can parse a full name into
Prefix (Dr. Mr. Ms. etc)
First Name
Last Name
Middle Name
etc
这里的挑战是它必须支持多个国家和语言的名称.我们的一个假设是,我们将始终获得国家和语言以及全名作为输入.
Challenge here is that it has to support names of multiple countries and languages. One assumption that we have is we will always get a country and language along with the full name as input.
全名可以采用任何格式.对于相同的国家/地区/语言组合,可能以名字,姓氏或名字倒写.逗号不会成为全名的一部分.
The full name may come in any format. For the same country / language combination, it may come in with first name last name or the reverse. Comma will not be a part of the Full Name.
可行吗?我们也对任何市售软件开放.
Is is feasible? We are also open to any commercially available software.
推荐答案
由于OP对任何商用产品均开放...
Since the OP was open to any commercially available offering...
"IBM InfoSphere Global Name Analytics"似乎是一种商业解决方案,可以满足解析[自由格式非结构化]个人姓名[全名]的原始要求;在解决其他回应中提到的某些名称歧义问题方面,显然具有一定的把握.
注意:我没有个人经验,也没有与产品的联系,我只是遇到了此讨论和以下参考链接,而同时有效地重新调查与OP所描述的问题相同的问题. HTH.
The "IBM InfoSphere Global Name Analytics" appears to be a commercial solution satisfying the original request for the parsing of a [free-form unstructured] personal name [full name]; apparently with a degree of certainty in regards to resolving some of the name ambiguity issues alluded to in other responses.
Note: I have no personal experience nor association with the product, I had merely encountered this discussion and the following reference links while re-investigating effectively the same concern as described by the OP. HTH.
A general product documentation link:
http://publib.boulder.ibm.com/infocenter/gnrgna/v4r1m0/topic/com.ibm.gnr.gna.ic.doc/topics/gnr_gna_con_gnaoverview.html
请参阅
的使用NameParser解析名称" http://publib. boulder.ibm.com/infocenter/gnrgna/v4r1m0/topic/com.ibm.gnr.gna.ic.doc/topics/gnr_np_con_parsingnamesusingnameparser.html
Refer to the "Parsing names using NameParser" at
http://publib.boulder.ibm.com/infocenter/gnrgna/v4r1m0/topic/com.ibm.gnr.gna.ic.doc/topics/gnr_np_con_parsingnamesusingnameparser.html
NameParser是每个
产品的组件API. http://publib. boulder.ibm.com/infocenter/gnrgna/v4r1m0/topic/com.ibm.gnr.gna.ic.doc/topics/gnr_gnm_con_logicalarchitecturecapis.html
The NameParser is a component API for the product per
http://publib.boulder.ibm.com/infocenter/gnrgna/v4r1m0/topic/com.ibm.gnr.gna.ic.doc/topics/gnr_gnm_con_logicalarchitecturecapis.html
请参阅
的使用IBM NameWorks解析名称" http://publib. boulder.ibm.com/infocenter/gnrgna/v4r1m0/topic/com.ibm.gnr.gna.ic.doc/topics/gnr_gnm_con_parsingnamesusingnameworks.html
Refer to the "Parsing names using IBM NameWorks" at
http://publib.boulder.ibm.com/infocenter/gnrgna/v4r1m0/topic/com.ibm.gnr.gna.ic.doc/topics/gnr_gnm_con_parsingnamesusingnameworks.html
"IBM NameWorks将各个IBM InfoSphere Global Name Recognition组件组合到一个统一的,易于使用的应用程序编程接口(API)中,并且还将此功能扩展到Java应用程序和Web服务".
"IBM NameWorks combines the individual IBM InfoSphere Global Name Recognition components into a single, unified, easy-to-use application programming interface (API), and also extends this functionality to Java applications and as a Web service"
要弄清楚为什么我认为这可以回答问题,从而减轻了先前提到的在完成任务时遇到的困难...如果我正确理解了我读到的内容,那么这些API会使用"NameHunter Server"来搜索"IBM InfoSphere Global". 名称数据档案库(NDA)"被描述为来自世界各地的近十亿个名称的集合,以及每个名称的性别和所属国家/地区.名称信息的大型存储库为IBM InfoSphere Global的算法和规则提供了强大的动力名称识别产品用于对名称进行分类,分类,解析,性别化和匹配."
To clarify why I think this answers the question, ameliorating some of the previous alluded difficulties in accomplishing the task... If I understood correctly what I read, the APIs use the "NameHunter Server" to search the "IBM InfoSphere Global Name Data Archive (NDA)" which is described as "a collection of nearly one billion names from around the world, along with gender and country of association for each name. This large repository of name information powers the algorithms and rules that IBM InfoSphere Global Name Recognition products use to categorize, classify, parse, genderize , and match names."
首先,我还遇到了一个名称解析器",该解析器使用约140K个名称的数据库,如下所示:
http://www.melissadata.com/dqt/websmart-web-services.htm
FWiW I also ran across a "Name Parser" which uses a database of ~140K names as noted at:
http://www.melissadata.com/dqt/websmart-web-services.htm
这篇关于将全名解析为其组成部分的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!