在Solr中进行精确单词搜索

在Solr中进行精确单词搜索

本文介绍了在Solr中进行精确单词搜索的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个与这个问题密切相关的问题.

在我的模式中,我有一个字段

In my schema I have a field

<field name="text" type="textgen" indexed="true" stored="true" required="true"/>

这给出了一个精确匹配,即.禁用茎

This gives an exact match, ie. stemming disabled

在配置为textgen时可以搜索该单词的其他变体形式

Is it possible, while configured to textgen to search for other variants of the word

eat〜0会发出类似的发音,例如肉,拍子等,但这不是我想要的.

eat~0 will give similar sounding words such as meat, beat etc. but this is not what I want.

我开始认为实现此目的的唯一方法是用textgen之外的其他内容添加另一个字段,但是如果有更简单的方法,我很想听听它.

I'm starting to think that the only way to achieve this is to add another field with something other then textgen but if there is a simpler way I am very interested to hear it.

推荐答案

使用copyfield语句是Solr中的常规方法.由于stemming正是您所要询问的答案,因此,我建议您使用它.如果您担心索引大小,可以设置stored=false.

Using copyfield statements is the normal approach in Solr. Since stemming is the answer to exactly what you're asking, this is what I recommend you to use. You can set stored=false if you are worried about index size.

您还可以使用lemmatisation,这与词干法相反-在其中您添加所有词形变化的词.通常在搜索查询上执行此操作,例如将eat扩展为eat, eats, eating等.

You might also use lemmatisation, which is the opposite of stemming - where you instead add a words all inflected forms. This is typically performed on the search query, expanding e.g., eat to eat, eats, eating etc.

第三个选择可能是使用通配符搜索,尽管我不鼓励这样做.尤其重要,因为它绕过了目标字段的所有架构配置的过滤器.

The third alternative might be to use wildcard search, although I wouldn't encourage it. Not least since it bypasses all schema configured filters for the target field.

这篇关于在Solr中进行精确单词搜索的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-21 05:13