问题描述
我正在为NFL球员数据库编写搜索功能.
I'm writing a search feature for a database of NFL players.
用户输入搜索字符串,例如"Jason Campbell" 或"Campbell" 或"Jason" .
The user enters a search string like "Jason Campbell" or "Campbell" or "Jason".
我无法获得适当的结果.
I'm having trouble getting the appropriate results.
索引时应使用哪个Analyzer
?查询时哪个Query
?我应该区分姓和名还是只对全名字符串编制索引?
Which Analyzer
should I use when indexing? Which Query
when querying? Should I distinguish between first name and last name or just index the full name string?
我想要以下行为:
查询:杰森·坎贝尔" -> 结果:一位球员杰森·坎贝尔的完全匹配
Query: "Jason Campbell" -> Result: exact match for 1 player, Jason Campbell
查询:坎贝尔" -> 结果:所有以坎贝尔命名的玩家
Query: "Campbell" -> Result: all players with Campbell in their name
查询:杰森" -> 结果:所有以杰森为名的球员
Query: "Jason" -> Result: all players with Jason in their name
查询:坎贝尔" [拼写错误]-> 结果:所有以坎贝尔命名的玩家
Query: "Cambel" [misspelled] -> Result: all players with Campbell in their name
推荐答案
StandardAnalyzer对于上述所有查询应该都可以正常工作.您的第一个查询应使用双引号括起来以进行完全匹配,而最后一个查询则需要模糊查询.例如,您可以将Cambell设置为0.5,然后将Campbell设置为match(在波浪号后的数字表示模糊性).
StandardAnalyzer should work fine for all above queries. Your first query should be enclosed in double-quotes for an exact match, your last query would require a fuzzy query. For example you could set Cambell~0.5 and you could get Campbell as match(with the numeric value after the tilde indicating the fuzziness).
顺便说一句,我建议使用Solr,它提供了拼写检查和自动建议功能,因此您不必重新发明轮子.这类似于Google的您是不是要..."
BTW I would suggest using Solr which provides features for spell-check and auto-suggest so you wouldn't have to reinvent the wheel. This is similar to Google's "did you mean..."
这篇关于如何使用Lucene进行个人名称(名字,姓氏)搜索?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!