问题描述
我有一个字段,其中包含逗号分隔的值,例如JSON,AngularJS,另一个字段为AngularJS,JSON,而其他字段仅包含JSON,HTML.
I have a field which has comma separated values for e.g JSON,AngularJS and another as AngularJS,JSON and other having JSON,HTML only.
现在我一直在尝试使用fq = field:( JSON AngularJS *)查询SOLR,但是它只返回AngularJS之前带有JSON的记录.
Now i have been trying to query SOLR using fq=field:(JSONAngularJS*), but it returns only the record with JSON before AngularJS.
我如何查询SOLR,以便它返回具有JSON和AngularJS但不考虑顺序的记录.
How can i query SOLR so that it returns both the records having JSON and AngularJS but not considering the order.
为该字段附加SOLR分析:
Attaching SOLR Analysis for the field:
查询形式: http://localhost:8983/solr/my_core/select?fq = field:(JSON%20AND%20AngularJS)& q = :
推荐答案
使用基于,
进行标记化的字段类型(即,列表中的每个条目都产生一个单独的标记).您可以使用 SimplifiedRegularExpressionPatternTokenizer :
Use a field type that is tokenized based on ,
(i.e. each entry in your list results in a separate token). You can do this by using a SimplifiedRegularExpressionPatternTokenizer:
<fieldType name="text" class="solr.TextField">
<analyzer>
<tokenizer class="solr.SimplePatternTokenizerFactory" pattern="[^,]+"/>
</analyzer>
</fieldType>
通过查询两个令牌都存在的文档来查询索引.
Query the index by asking for documents having both tokens present fq=field:(JSON AND AngularJS)
.
(问题更新后)
首先-您的字段似乎是 string 字段,而不是TextField.
First - your field seems to be a string field, and not a TextField.
要通过API添加具有正确定义的字段:
To add a field through the API with the correct definition:
curl -X POST -H 'Content-type:application/json' --data-binary '{
"add-field-type" : {
"name":"comma-separated-list",
"class":"solr.TextField",
"positionIncrementGap":"100",
"analyzer" : {
"tokenizer":{
"class":"solr.SimplePatternTokenizerFactory", "pattern": "[^,]+" },
}
}
}
}' http://localhost:8983/solr/collectionname/schema
添加一组示例文档后:
[
{
"langs":"JSON,AngularJS,Microsoft Visual Basic",
"id":"foo",
"address":"None",
"_version_":1606953238273196032},
{
"langs":"JSON,AngularJS",
"id":"foo2",
"address":"None",
"_version_":1606953238277390336},
{
"langs":"JSON,Microsoft Visual Basic",
"id":"foo3",
"address":"None",
"_version_":1606953238278438912},
{
"langs":"AngularJS,JSON",
"id":"foo4",
"address":"None",
"_version_":1606953238278438913}]
然后使用fq=langs:(JSON AND AngularJS)&q=*:*)
查询:
{
"langs":"JSON,AngularJS,Microsoft Visual Basic",
"id":"foo",
"address":"None",
"_version_":1606953238273196032},
{
"langs":"JSON,AngularJS",
"id":"foo2",
"address":"None",
"_version_":1606953238277390336},
{
"langs":"AngularJS,JSON",
"id":"foo4",
"address":"None",
"_version_":1606953238278438913}]
未定义AngularJS
的文档已被删除.
The document that didn't have AngularJS
defined has been left out.
这篇关于SOLR查询逗号分隔的字段,无顺序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!