Elasticsearch匹配查询与带有撇号的文档不匹配

本文介绍了Elasticsearch匹配查询与带有撇号的文档不匹配的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在为地区自动完成功能(一种较简单的Google Maps版本)构建搜索器.我正在使用的查询似乎一切正常:

I'm building a searcher for a localities autocomplete, a simpler version of Google Maps one. Everything seemed to be working ok with the query I was using:

{
  "query": {
    "bool": {
      "must": {
        "multi_match": {
          "query": "Ametlla",
          "type": "best_fields",
          "fields": [
            "locality",
            "alternative_names"
          ],
          "operator": "and"
        }
      },
      "filter": {
        "term": {
          "country_code": "ES"
        }
      }
    }
  }
}

我发现的问题与西班牙的一个城市有关:滨海拉梅塔.

The issue I discovered is related to a city from Spain: L'Ametlla de Mar.

/localities_index/localities/10088

{
  "_index": "localities_index",
  "_type": "localities",
  "_id": "10088",
  "_version": 1,
  "_seq_no": 133,
  "_primary_term": 4,
  "found": true,
  "_source": {
    "country_code": "es",
    "locality": "L'Ametlla de Mar",
    "alternative_names": []
  }
}

您可以搜索与之匹配的Ametlla(请参阅下面的部分名称示例查询)

You can search for Ametlla and it's matched (see following partial name example query)

{
    "query": {
        "match": {
            "locality": {
                "query" : "Ametlla"
            }
        }
    }
}

/localities_index/localities/10088/_explain

{
  "_index": "localities_index",
  "_type": "localities",
  "_id": "10088",
  "matched": true,
  "explanation": {
    "value": 3.3985975,
    "description": "weight(locality:ametlla in 2) [PerFieldSimilarity], result of:",
    "details": [
      {
        "value": 3.3985975,
        "description": "score(freq=1.0), product of:",
        "details": [
          {
            "value": 2.2,
            "description": "boost",
            "details": []
          },
          {
            "value": 3.6686769,
            "description": "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
            "details": [
              {
                "value": 2,
                "description": "n, number of documents containing term",
                "details": []
              },
              {
                "value": 97,
                "description": "N, total number of documents with field",
                "details": []
              }
            ]
          },
          {
            "value": 0.4210829,
            "description": "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
            "details": [
              {
                "value": 1.0,
                "description": "freq, occurrences of term within document",
                "details": []
              },
              {
                "value": 1.2,
                "description": "k1, term saturation parameter",
                "details": []
              },
              {
                "value": 0.75,
                "description": "b, length normalization parameter",
                "details": []
              },
              {
                "value": 9.0,
                "description": "dl, length of field",
                "details": []
              },
              {
                "value": 7.5360823,
                "description": "avgdl, average length of field",
                "details": []
              }
            ]
          }
        ]
      }
    ]
  }
}

但如果使用其全名，则不会.

but if you use its full name it is not.

正如我在 https://stackoverflow.com/a/49362505punctuation添加到token_chars中>但是没有用.因此，我尝试将'添加为custom_token_chars，但它也不起作用./localities_index/_settings

I've tried adding punctuation to token_chars, as I saw at https://stackoverflow.com/a/49362505 but it didn't work. So I tried adding ' as custom_token_chars and it didn't work either./localities_index/_settings

{
  "localities_index": {
    "settings": {
      "index": {
        "number_of_shards": "1",
        "provided_name": "localities_index",
        "creation_date": "1596537683568",
        "analysis": {
          "analyzer": {
            "autocomplete": {
              "filter": [
                "lowercase",
                "asciifolding"
              ],
              "tokenizer": "autocomplete"
            },
            "autocomplete_search": {
              "filter": [
                "lowercase",
                "asciifolding"
              ],
              "tokenizer": "lowercase"
            }
          },
          "tokenizer": {
            "autocomplete": {
              "token_chars": [
                "letter",
                "digit"
              ],
              "custom_token_chars": "'",
              "min_gram": "1",
              "type": "edge_ngram",
              "max_gram": "15"
            }
          }
        },
        "number_of_replicas": "1",
        "uuid": "lS3Ork2zSySYJbJYmx29aw",
        "version": {
          "created": "7040099"
        }
      }
    }
  }
}

/localities_index/_mapping

{
  "localities_index": {
    "mappings": {
      "properties": {
        "alternative_names": {
          "type": "text",
          "analyzer": "autocomplete",
          "search_analyzer": "autocomplete_search"
        },
        "country_code": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "locality": {
          "type": "text",
          "analyzer": "autocomplete",
          "search_analyzer": "autocomplete_search"
        }
      }
    }
  }
}

Ametlla

Elasticsearch匹配查询与带有撇号的文档不匹配

问题描述

推荐答案