给定索引中的以下文档(将其称为地址):

{
    ADDRESS: {
        ID: 1,
        LINE1: "steet 1",
        CITY: "kuala lumpur",
        COUNTRY: "MALAYSIA",
        ...
    }
}
{
    ADDRESS: {
        ID: 2,
        LINE1: "steet 1",
        CITY: "kualalumpur city",
        COUNTRY: "MALAYSIA",
        ...
    }
}
{
    ADDRESS: {
        ID: 3,
        LINE1: "steet 1",
        CITY: "kualalumpur",
        COUNTRY: "MALAYSIA",
        ...
    }
}
{
    ADDRESS: {
        ID: 4,
        LINE1: "steet 1",
        CITY: "kuala lumpur city",
        COUNTRY: "MALAYSIA",
        ...
    }
}
在这一点上,我找到了搜索文本为“kualalumpur”的查询,以获取“kualalumpur”,“kuala lumpur”,“kualalumpur city”。
但是,尽管与“kualalumpur city”几乎相似,但是从结果中缺少“kuala lumpur city”。
到目前为止,这是我的查询:
{
  "query": {
    "bool": {
      "should": [
          {"match": {"ADDRESS.STREET": {"query": "street 1", "fuzziness": 1, "operator": "AND"}}},
          {
            "bool": {
              "should": [
                {"match": {"ADDRESS.CITY": {"query": "kualalumpur", "fuzziness": 1, "operator": "OR"}}},
                {"match": {"ADDRESS.CITY.keyword": {"query": "kualalumpur", "fuzziness": 1, "operator": "OR"}}}
              ]
            }
          }
        ],
      "filter": {
        "bool": {
          "must": [
            {"term": {"ADDRESS.COUNTRY.keyword": "MALAYSIA"}}
          ]
        }
      },
      "minimum_should_match": 2
    }
  }
}
给定条件,Elasticsearch是否有可能返回所有四个带有搜索文本“kualalumpur”的文档?

最佳答案

您可以在country字段上使用edge-n gram tokenizer来获取所有四个文档,在我的本地环境中尝试过并添加以下工作示例。
创建自定义分析器并将其应用于您的字段

{
    "settings": {
        "index": {
            "analysis": {
                "analyzer": {
                    "ngram_analyzer": {
                        "type": "custom",
                        "filter": [
                            "lowercase"
                        ],
                        "tokenizer": "edgeNGramTokenizer"
                    }
                },
                "tokenizer": {
                    "edgeNGramTokenizer": {
                        "token_chars": [
                            "letter",
                            "digit"
                        ],
                        "min_gram": "1",
                        "type": "edgeNGram",
                        "max_gram": "40"
                    }
                }
            },
            "max_ngram_diff": "50"
        }
    },
    "mappings": {
        "properties": {
            "country": {
                "type": "text",
                "analyzer" : "ngram_analyzer"
            }
        }
    }
}
为所有四个示例文档编制索引,例如
{
  "country" : "kuala lumpur"
}
带有术语kualalumpur搜索查询与所有四个文档匹配
{
    "query": {
        "match" : {
            "country" : "kualalumpur"
        }
    }
}

 "hits": [
      {
        "_index": "fuzzy",
        "_type": "_doc",
        "_id": "3",
        "_score": 5.0003963,
        "_source": {
          "country": "kualalumpur"
        }
      },
      {
        "_index": "fuzzy",
        "_type": "_doc",
        "_id": "2",
        "_score": 4.4082437,
        "_source": {
          "country": "kualalumpur city"
        }
      },
      {
        "_index": "fuzzy",
        "_type": "_doc",
        "_id": "1",
        "_score": 0.5621849,
        "_source": {
          "country": "kuala lumpur"
        }
      },
      {
        "_index": "fuzzy",
        "_type": "_doc",
        "_id": "4",
        "_score": 0.4956103,
        "_source": {
          "country": "kuala lumpur city"
        }
      }
    ]


关于elasticsearch - Elasticsearch相似文本查询,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/62948239/

10-11 09:17