本文介绍了在 not_analyzed 字段上进行 Elasticsearch 通配符搜索的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个类似以下设置和映射的索引;

I have an index like following settings and mapping;

{
  "settings":{
     "index":{
        "analysis":{
           "analyzer":{
              "analyzer_keyword":{
                 "tokenizer":"keyword",
                 "filter":"lowercase"
              }
           }
        }
     }
  },
  "mappings":{
     "product":{
        "properties":{
           "name":{
              "analyzer":"analyzer_keyword",
              "type":"string",
              "index": "not_analyzed"
           }
        }
     }
  }
}

我正在努力实现对 name 字段的通配符搜索.我的示例数据是这样的;

I am struggling with making an implementation for wildcard search on name field. My example data like this;

[
{"name": "SVF-123"},
{"name": "SVF-234"}
]

当我执行以下查询时;

http://localhost:9200/my_index/product/_search -d '
{
    "query": {
        "filtered" : {
            "query" : {
                "query_string" : {
                    "query": "*SVF-1*"
                }
            }
        }

    }
}'

它返回 SVF-123,SVF-234.我认为,它仍然标记数据.它必须只返回 SVF-123.

It returns SVF-123,SVF-234. I think, it still tokenizes data. It must return only SVF-123.

你能帮忙吗?

提前致谢

推荐答案

我的解决方案冒险

正如您在我的问题中看到的那样,我已经开始了我的案件.每当我更改了一部分设置时,一部分开始工作,但另一部分停止工作.让我给出我的解决方案历史:

I have started my case as you can see in my question. Whenever, I have changed a part of my settings, one part started to work, but another part stop working. Let me give my solution history:

1.) 我已默认为我的数据编制索引.这意味着,我的数据默认为 analyzed.这会在我这边造成问题.例如;

1.) I have indexed my data as default. This means, my data is analyzed as default. This will cause problem on my side. For example;

当用户开始搜索 SVF-1 等关键字时,系统会运行以下查询:

When user started to search a keyword like SVF-1, system run this query:

{
    "query": {
        "filtered" : {
            "query" : {
                "query_string" : {
                    "analyze_wildcard": true,
                    "query": "*SVF-1*"
                }
            }
        }

    }
}

和结果;

SVF-123
SVF-234

这是正常的,因为我的文档的name字段是analyzed的.这会将查询拆分为标记 SVF1,并且 SVF 匹配我的文档,尽管 1 不匹配.我已经跳过了这条路.我已经为我的字段创建了一个映射,使它们 not_analyzed

This is normal, because name field of my documents are analyzed. This splits query into tokens SVF and 1, and SVF matches my documents, although 1 does not match. I have skipped this way. I have create a mapping for my fields make them not_analyzed

{
  "mappings":{
     "product":{
        "properties":{
           "name":{
              "type":"string",
              "index": "not_analyzed"
           },
           "site":{
              "type":"string",
              "index": "not_analyzed"
           }
        }
     }
  }
}

但我的问题仍然存在.

2.) 经过大量研究后,我想尝试另一种方式.决定使用通配符查询.我的查询是;

2.) I wanted to try another way after lots of research. Decided to use wildcard query.My query is;

{
    "query": {
        "wildcard" : {
            "name" : {
                "value" : *SVF-1*"
             }
          }
      },
            "filter":{
                    "term": {"site":"pro_en_GB"}
            }
    }
}

这个查询有效,但这里有一个问题.我的字段不再被分析,我正在进行通配符查询.区分大小写是这里的问题.如果我像 svf-1 一样搜索,它不会返回任何内容.因为,用户可以输入小写版本的查询.

This query worked, but one problem here. My fields are not_analyzed anymore, and I am making wildcard query. Case sensitivity is problem here. If I search like svf-1, it returns nothing. Since, user can input lowercase version of query.

3.) 我已将文档结构更改为;

3.) I have changed my document structure to;

{
  "mappings":{
     "product":{
        "properties":{
           "name":{
              "type":"string",
              "index": "not_analyzed"
           },
           "nameLowerCase":{
              "type":"string",
              "index": "not_analyzed"
           }
           "site":{
              "type":"string",
              "index": "not_analyzed"
           }
        }
     }
  }
}

我为 name 添加了一个名为 nameLowerCase 的字段.当我为我的文档编制索引时,我将我的文档设置为:

I have adde one more field for name called nameLowerCase. When I am indexing my document, I am setting my document like;

{
    name: "SVF-123",
    nameLowerCase: "svf-123",
    site: "pro_en_GB"
}

在这里,我将查询关键字转换为小写,并对新的 nameLowerCase 索引进行搜索操作.并显示 name 字段.

Here, I am converting query keyword to lowercase and make search operation on new nameLowerCase index. And displaying name field.

我的查询的最终版本是;

Final version of my query is;

{
    "query": {
        "wildcard" : {
            "nameLowerCase" : {
                "value" : "*svf-1*"
             }
          }
      },
            "filter":{
                    "term": {"site":"pro_en_GB"}
            }
    }
}

现在可以了.还有一种方法可以使用 multi_field.我的查询包含破折号(-),并且遇到了一些问题.

Now it works. There is also one way to solve this problem by using multi_field. My query contains dash(-), and faced some problems.

非常感谢@Alex Brasetvik 的详细解释和努力

Lots of thanks to @Alex Brasetvik for his detailed explanation and effort

这篇关于在 not_analyzed 字段上进行 Elasticsearch 通配符搜索的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-06 16:57