名称的同义词,我不知道是否有可用的公共同义词db.模糊搜索,我没有发现它有用,它使用 Levenshtein Distance.其他过滤器和索引获得更优质的搜索相关"结果.名称中的 Unicode 字符可以使用 ASCIIFoldingFilterFactory您正在预先描述预期用例的解决方案.如果您想要高质量的结果,请计划调整您的搜索相关性当尝试匹配同义词时,此调整将特别有价值,例如 MacDonald 和 McDonald(其 Levenshtein 距离比 Carl 和 Karl 大).I've just ventured into the seemingly simple but extremely complex world of searching. For an application, I am required to build a search mechanism for searching users by their names.After reading numerous posts and articles including:How can I use Lucene for personal name (first name, last name) search?http://dublincore.org/documents/1998/02/03/name-representation/what's the best way to search a social network by prioritizing a users relationships first?http://www.gossamer-threads.com/lists/lucene/java-user/120417Lucene Index and Query Design Question - Searching PeopleLucene Fuzzy Search for customer names and partial address... and a few others I cannot find at-the-moment. And getting at-least indexing and basic search working in my machine I have devised the following scheme for user searching:1) Have a first, second and third name field and index those with Solr2) Use edismax as the requestParser for multi column searching3) Use a combination of normalization filters such as: transliteration, latin-to-ascii convesrion, etc.4) Finally use fuzzy searchEvidently, being very new to this I am unsure if the above is the best way to do it and would like to hear from experienced users who have a better idea than me in this field.I need to be able to match names in the following ways:1) Accent folding: Jorn matches Jörn and vise versa2) Alternative spellings: Karl matches Carl and vice versa3) Shortened representations (I believe I do this with the SynonymFilterFactory): Sue matches Susanne, etc.4) Levenstein matching: Jonn matches John, etc.5) Soundex matching: Elin and EllenAny guidance, criticisms or comments are very welcome. Please let me know if this is possible ... or perhaps I'm just day-dreaming. :)EDITI must also add that I also have a fullname field in case some people have long names, as an example from one of the posts: Jon Paul or Del Carmen should also match Jon Paul Del CarmenAnd since this is a new project, I can modify the schema and architecture any way I see fit so there are very limited restrictions. 解决方案 It sounds like you are catering for a corpus with searches that you need to match very loosely?If you are doing that you will want to choose your fields and set different boosts to rank your results.So have separate "copied" fields in solr:one field for exact full name (with filters)multivalued field with filters ASCIIFolding, Lowercase...multivalued field with the SynonymFilterFactory ASCIIFolding, Lowercase...PhoneticFilterFactory (with Caverphone or Double-Metaphone)See Also: more non-english Soundex discussionSynonyms for names, I don't know if there is a public synonym db available.Fuzzy searching, I've not found it useful, it uses Levenshtein Distance.Other filters and indexing get more superior "search relevant" results.Unicode characters in names can be handled with the ASCIIFoldingFilterFactoryYou are describing solutions up front for expected use cases.If you want quality results, plan on tuning your Search RelevanceThis tuning will be especially valuable, when attempting to match on synonyms, like MacDonald and McDonald (which has a larger Levenshtein distance than Carl and Karl). 这篇关于使用 Apache Solr 搜索名称的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持! 上岸,阿里云!