问题描述
我在Solr中进行了以下类型定义:
I have made the following type definition in Solr:
<fieldType name="text_phrase" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.KeywordTokenizerFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
</analyzer>
</fieldType>
它应逐字索引值(无标记化).
It should index values verbatim (no tokenization).
我在索引中添加了紧身牛仔裤"的值.
I add the value "skinny jeans" to my index.
当我运行以下搜索查询(为读取而解码的网址)时,我没有结果:
When I run the following search query (url decoded for reading) I get no results:
http://myvm:8983/solr/mycore/select?q=*:*&fq=name:("skinny jeans")&wt=json&indent=true&debugQuery=true
您可以看到URL正在使用过滤器查询来搜索所有内容(*:*),确切的值是紧身牛仔裤".
You can see the URL is searching for everything (*:*) with a filter query for the exact value "skinny jeans".
然后我将值"jeans"添加到索引中,并使用
I then add the value "jeans" to my index, and run a similar query with
&fq=name:("jeans")
然后我这样做找到牛仔裤"元素.
And I do find the "jeans" element.
因此它适用于单个单词,但不适用于多个单词.为什么会这样呢?我毕竟是在寻找一个确切的值.这让我怀疑KeywordTokenizerFactory做的事情很奇怪.谁能告诉我为什么这样的基本设置没有任何结果?
So it works for a single word, but not for multiple words. Why would this be? I'm searching for an exact value after all. It makes me suspect that the KeywordTokenizerFactory is doing something odd. Can anyone please advise why no results are being returned from such a basic setup?
谢谢
推荐答案
这是因为您正在使用KeywordTokenizerFactory
进行索引编制,从而使单词保持原样.不应用任何标记化或不创建任何标记.但是在查询时,您使用的是WhitespaceTokenizerFactory
,它为空白创建令牌.
This is because you are using the KeywordTokenizerFactory
for indexing which keeps the word as it is. Does not apply any tokenization or does not create any tokens. But While querying you are using WhitespaceTokenizerFactory
which creates tokens for the whitespace.
因此KeywordTokenizerFactory
在索引中将有一个像"skinny jeans"
这样的令牌作为单个令牌.
So KeywordTokenizerFactory
will have a token like "skinny jeans"
as single token in the index.
WhitespaceTokenizerFactory
将创建类似"skinny", "jeans"
的令牌.
您可以看到差异,但不会匹配.您正在针对"skinny jeans"
搜索"skinny", "jeans"
.
You can see the difference, it wont match. You are searching for "skinny", "jeans"
against "skinny jeans"
.
您需要更改索引标记器或查询标记器.
You need to either change the index tokenizer or the query tokenizer.
如果要继续进行精确匹配,则在标记和查询时都将KeywordTokenizerFactory
都保留在令牌生成器中,如下所示
If you want to go ahead for the exact match then keep the KeywordTokenizerFactory
for both as in tokenizer while indexing and querying as shown below
<fieldType name="text_phrase" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.KeywordTokenizerFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.KeywordTokenizerFactory"/>
</analyzer>
</fieldType>
您可以使用solr分析工具检查在编制索引时创建的令牌和在查询时创建的令牌.
You can check the token created while indexing and token created while querying using solr analysis tool.
这篇关于Solr-KeywordTokenizerFactory-多个单词的完全匹配不起作用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!