我正在尝试使同义词适用于现有设置。目前,我有以下设置:

PUT city
{
    "settings": {
        "analysis": {
            "analyzer": {
                "autocomplete": {
                    "tokenizer": "autocomplete",
                    "filter": [
                        "lowercase",
                        "my_synonym_filter",
                        "german_normalization",
                        "my_ascii_folding"
                    ]
                },
                "autocomplete_search": {
                    "tokenizer": "lowercase",
                    "filter": [
                        "lowercase",
                        "my_synonym_filter",
                        "german_normalization",
                        "my_ascii_folding"
                    ]
                }
            },
                  "filter": {
                     "my_ascii_folding": {
                     "type": "asciifolding",
                     "preserve_original": true
            },
                  "my_synonym_filter": {
                  "type": "synonym",
                  "ignore_case": "true",
                  "synonyms": [
                     "sankt, st => sankt"
                  ]
            }
          },
            "tokenizer": {
                "autocomplete": {
                    "type": "edge_ngram",
                    "min_gram": 1,
                    "max_gram": 15,
                    "token_chars": [
                        "letter",
                        "digit",
                        "symbol"
                    ]
                }
            }
        }
    },
    "mappings": {
        "city": {
            "properties": {
                "name": {
                    "type": "text",
                    "analyzer": "autocomplete",
                    "search_analyzer": "autocomplete_search"
                }
            }
        }
    }
}

在这个City索引中,我有类似的文档:
St. WolfgangSankt Wolfgang等。对我来说St.Sankt是同义词。因此,如果我搜索Sankt,则两个文档都应出现。

我创建了一个新的过滤器,并将过滤器添加到我的autocomplete analyzer中:
"my_synonym_filter": {
   "type": "synonym",
    "ignore_case": "true",
    "synonyms": [
        "sankt, st."
    ]
}

现在很好。但是我面临的问题如下:

很明显,st之后的点目前尚未分析且不可搜索。但是对于同义词来说,点很重要。

第二个问题是,如果我搜索sankt,同义词是st,这给了我所有以st开头的文档,例如Stuttgart。因此也会发生这种情况,因为未使用该点。

您知道我如何实现这些目标吗?如果您需要更多信息,请告诉我。

更新:

讨论之后,我在设置中做了以下更改:

edge_ngram标记器更改为standard标记器。

添加了edgeNGram过滤器,并将此过滤器添加到了我的分析仪。

从我的分析仪中删除了过滤器german_normalizationmy_ascii_folding,以简化测试。
PUT city
{
  "settings": {
    "analysis": {
      "analyzer": {
        "autocomplete": {
          "tokenizer": "autocomplete",
          "filter": [
            "lowercase",
            "my_synonym_filter",
            "edge_filter"
          ]
        },
        "autocomplete_search": {
          "tokenizer": "autocomplete",
          "filter": [
            "my_synonym_filter",
            "lowercase"
          ]
        }
      },
      "filter": {
        "edge_filter": {
          "type": "edgeNGram",
          "min_gram": 1,
          "max_gram": 15
        },
        "my_synonym_filter": {
          "type": "synonym",
          "ignore_case": "true",
          "synonyms": [
            "sankt, st => sankt"
          ]
        }
      },
      "tokenizer": {
        "autocomplete": {
          "type": "standard"
        }
      }
    }
  },
  "mappings": {
    "city": {
      "properties": {
        "name": {
          "type": "text",
          "analyzer": "autocomplete",
          "search_analyzer": "autocomplete_search"
        }
      }
    }
  }
}

我将这3个文档添加到索引中:
"name":"Sankt Wolfgang",
"name":"Stuttgart",
"name":"St. Wolfgang"

查询字符串-结果
st      ->    "St. Wolfgang", "Stuttgart"
st.     ->    "St. Wolfgang", "Sankt Wolfgang"
sankt   ->    "St. Wolfgang", "Sankt Wolfgang"

最佳答案

这对我来说很好。这里的重点是要确保

  • 将同义词过滤器放在小写的一个
  • 之后
  • 将edge-n-gram过滤器放在
  • 的末尾
  • 仅在索引编制时使用edge-n-gram

  • 所以我们创建索引:
    PUT city
    {
      "settings": {
        "analysis": {
          "analyzer": {
            "autocomplete": {
              "tokenizer": "standard",
              "filter": [
                "lowercase",
                "my_synonym_filter",
                "edge_filter"
              ]
            },
            "autocomplete_search": {
              "tokenizer": "standard",
              "filter": [
                "lowercase",
                "my_synonym_filter"
              ]
            }
          },
          "filter": {
            "edge_filter": {
              "type": "edgeNGram",
              "min_gram": 1,
              "max_gram": 15
            },
            "my_synonym_filter": {
              "type": "synonym",
              "ignore_case": "true",
              "synonyms": [
                "sankt, st. => sankt"
              ]
            }
          }
        }
      },
      "mappings": {
        "city": {
          "properties": {
            "name": {
              "type": "text",
              "analyzer": "autocomplete",
              "search_analyzer": "autocomplete_search"
            }
          }
        }
      }
    }
    

    然后我们索引数据:
    PUT city/city/1
    {
      "name":"St. Wolfgang"
    }
    PUT city/city/2
    {
      "name":"Stuttgart"
    }
    PUT city/city/3
    {
      "name":"Sankt Wolfgang"
    }
    

    最后搜索stsankt将仅返回文档1和3,而不返回2
    POST city/_search?q=name:st
    POST city/_search?q=name:sankt
    

    关于elasticsearch - Sankt和St的同义词,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/49552172/

    10-11 08:45