本文介绍了当docvalues = true时,小写过滤器工厂不起作用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用Solr实现不区分大小写的排序,并遇到了这个问题.

I am trying to achieve case insensitive sorting using Solr and faced this issue.

[已复制]

....But When I get search result its not sorted case insensitive. It gives all camel case result first and then all lower case

If I m having short names

Banu

Ajay

anil

sudhir

Nilesh

It sorts like Ajay, Banu, Nilesh, anil, sudhir
...................

我遵循了解决方案,并在solr schema.xml文件中进行了以下更改(仅涉及relevent字段和字段类型)显示):

I followed the solution and made the following changes in my solr schema.xml file (only relevent field and field type is shown):

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<schema name="autoSolrSchema" version="1.5">
	<types>
		...............
		<fieldType class="org.apache.solr.schema.TextField" name="TextField">
			<analyzer>
				<tokenizer class="solr.KeywordTokenizerFactory"/>
				<filter class="solr.LowerCaseFilterFactory"/>
			</analyzer>
		</fieldType>
		.............
	</types>
	<fields>
	.................
		<field indexed="true" multiValued="false" name="name" stored="true" type="TextField" docValues="true" />
	................
	</fields>
	<uniqueKey>id</uniqueKey>
	</schema>

但这并不能解决排序问题.因此,我从字段定义中删除了 docValues="true",然后重试.这次排序工作正常,但是我必须在查询中指定useFieldCache=true.

But that didn't solve the sorting issue. So I removed docValues="true" from the field definition and tried again. This time sorting worked fine, but I had to specify useFieldCache=true in the query.

为什么solr.LowerCaseFilterFactorydocValues="true"不兼容?

还有其他方法可以使不区分大小写的排序工作而无需删除docValues="true"并指定useFieldCache=true吗?

Is there any other ways to make case insensitive sorting to work without removing docValues="true" and specifying useFieldCache=true?

更新:

我遵循ericLavault的建议并实现了更新请求处理器.但是现在我面临以下问题:

I followed ericLavault's advice and implemented Update Request processor. But now I am facing the following issues:

1)我们正在使用dse搜索.因此,遵循了本文中指定的方法.

1) We are using dse search. So followed the method specified in this article.

我们当前的表架构:

CREATE TABLE IF NOT EXISTS test_data(
    id      UUID,
    nm      TEXT,
    PRIMARY KEY (id)

Solr模式:

 Solr schema :

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<schema name="autoSolrSchema" version="1.5">
	<types>
		<fieldType class="org.apache.solr.schema.UUIDField" name="UUIDField"/>
		<fieldType class="org.apache.solr.schema.StrField" name="StrField"/>
	</types>
	<fields>
		<field indexed="true" multiValued="false" name="nm" stored="true" type="StrField" docValues="true"/>
		<field indexed="true" multiValued="false" name="id" stored="true" type="UUIDField"/>
		<field indexed="true" multiValued="false" name="nm_s" stored="true" type="StrField" docValues="true"/>
	</fields>
	<uniqueKey>id</uniqueKey>
</schema>

根据建议,我将nm转换为lowecase,并使用更新请求处理器将其作为nm_s插入.然后重新加载架构并重新索引.但是在使用此select nm from test_data where solr_query='{"q": "(-nm:(sssss))" ,"paging":"driver","sort":"nm_s asc"}';

As advised , I converted nm to lowecase and inserted as nm_s using update request processor. Then reloaded the schema and reindexed . But while querying using this select nm from test_data where solr_query='{"q": "(-nm:(sssss))" ,"paging":"driver","sort":"nm_s asc"}';

我遇到以下错误:

...enable docvalues true n reindex or place useFieldCache=true...

2)如何确保正确更新值nm_s?有什么办法可以看到nm_s的值吗?

2) How can I ensure that the value nm_s is properly updated? Is there any way to see the value of nm_s?

3),即使启用了docValues,为什么还会出现上述错误?

3) Why am I getting the above mentioned error even if docValues is enabled?

推荐答案

此问题可能是由于DocValues最初旨在支持未经分析的类型而引起的.它不支持TextField:

This issue probably comes from the fact that DocValues was designed to support unanalyzed types originally. It does not support TextField :

  • StrField和UUIDField:
    • 如果该字段是单值字段(即,多值字段为false),则Lucene将使用SORTED类型.
    • 如果该字段是多值字段,Lucene将使用SORTED_SET类型.
    • StrField and UUIDField :
      • If the field is single-valued (i.e., multi-valued is false), Lucene will use the SORTED type.
      • If the field is multi-valued, Lucene will use the SORTED_SET type.
      • 如果该字段是单值字段(即,多值字段为false),则Lucene将使用NUMERIC类型.
      • 如果该字段是多值字段,Lucene将使用SORTED_SET类型.

      (引自 https://cwiki.apache.org/confluence/display/solr/DocValues )

      Solr Jira上存在为TextField添加docValues支持的问题( SOLR-8362 ),但仍处于打开状态且未分配.

      There is an issue on Solr Jira to add docValues support for TextField (SOLR-8362), but still open and unassigned.

      要在不删除docValues="true"的情况下进行不区分大小写的排序工作,您将必须使用字符串字段类型(solr.StrField),但是由于您无法使用字符串类型定义任何<analyser>,因此您将需要更新请求处理器以小写输入流(或类似的预处理字段)内容发送到Solr之前.

      To make case insensitive sorting work without removing docValues="true", you will have to use a string field type (solr.StrField), but since you can't define any <analyser> with string type you will need an Update Request Processor to lowercase the input stream (or equivalent like preprocessing the field content before sending data to Solr).

      如果您希望对字段进行标记以进行搜索,并使用DocValues进行排序,则可以根据实际文本字段(不使用DocValues)和字符串使用 copyField 字段进行排序(处理为小写并启用了DocValues).

      If you want your field to be tokenized for search and sorted using DocValues, you may use a copyField based on your actual text field (without DocValues) and a string field to be sorted on (processed for lowercase and with DocValues enabled).

      这篇关于当docvalues = true时,小写过滤器工厂不起作用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-15 09:21