问题描述
刚刚开始使用 Lucene.Net.我使用标准分析器索引了 100,000 行,运行了一些测试查询,并注意到如果原始术语是单数,复数查询不会返回结果.我了解雪球分析器添加了词干支持,这听起来不错.但是,我想知道使用雪球超过标准的锣是否有任何缺点?跟着它走,我会失去什么吗?是否还有其他分析仪可供考虑?
Just getting started with Lucene.Net. I indexed 100,000 rows using standard analyzer, ran some test queries, and noticed plural queries don't return results if the original term was singular. I understand snowball analyzer adds stemming support, which sounds nice. However, I'm wondering if there are any drawbacks to gong with snowball over standard? Am I losing anything by going with it? Are there any other analyzers out there to consider?
推荐答案
是的,通过使用 Snowball 等词干分析器,您会丢失有关文本原始形式的信息.有时这很有用,有时没有.
Yes, by using a stemmer such as Snowball, you are losing information about the original form of your text. Sometimes this will be useful, sometimes not.
例如,Snowball 会将组织"分解为器官",因此搜索组织"将返回带有器官"的结果,没有任何得分惩罚.
For example, Snowball will stem "organization" into "organ", so a search for "organization" will return results with "organ", without any scoring penalty.
这是否适合您取决于您的内容,以及您支持的查询类型(例如,搜索是否非常基本,或者用户是否非常复杂并使用您的搜索来准确过滤结果).您可能还想研究不太激进的词干分析器,例如 KStem.
Whether or not this is appropriate to you depends on your content, and on the type of queries you are supporting (for example, are the searches very basic, or are users very sophisticated and using your search to accurately filter down the results). You may also want to look into less aggressive stemmers, such as KStem.
这篇关于Lucene 标准分析器与 Snowball的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!