我正在使用Elasticsearch 7.2.0,我想创建搜索建议。

例如,我有以下3个电影标题:



当我输入“ aven ”时,应返回如下建议:

aven gers
aven gers infinity
aven gers年龄

当我输入“ avengers inf ”时

复仇者inf inity war
复仇者inf inity war第2部分

经过大量的elasticsearch教程之后,我做到了:

Check Cluster

PUT movies
{
  "settings": {
    "index": {
      "analysis": {
        "filter": {},
        "analyzer": {
          "keyword_analyzer": {
            "filter": [
              "lowercase",
              "asciifolding",
              "trim"
            ],
            "char_filter": [],
            "type": "custom",
            "tokenizer": "keyword"
          },
          "edge_ngram_analyzer": {
            "filter": [
              "lowercase"
            ],
            "tokenizer": "edge_ngram_tokenizer"
          },
          "edge_ngram_search_analyzer": {
            "tokenizer": "lowercase"
          },
          "completion_analyzer": {
            "tokenizer": "keyword",
            "filter": "lowercase"
          }
        },
        "tokenizer": {
          "edge_ngram_tokenizer": {
            "type": "edge_ngram",
            "min_gram": 2,
            "max_gram": 5,
            "token_chars": [
              "letter"
            ]
          }
        }
      }
    }
  },
  "mappings": {

      "properties": {
        "name": {
          "type": "text",
          "fields": {
            "keywordstring": {
              "type": "text",
              "analyzer": "keyword_analyzer"
            },
            "edgengram": {
              "type": "text",
              "analyzer": "edge_ngram_analyzer",
              "search_analyzer": "edge_ngram_search_analyzer"
            },
            "completion": {
              "type": "completion"
            }
          },
          "analyzer": "standard"
        },
        "completion_terms": {
          "type": "text",
          "fielddata": true,
          "analyzer": "completion_analyzer"
        }
      }

  }
}

以下文档:
POST movies/_doc/1
{
  "name": "Spider-Man: Homecoming",
  "completion_terms": [
    "spider-man",
    "homecomming"
  ]
}

POST movies/_doc/2
{
  "name": "Ant-man and the Wasp",
  "completion_terms": [
    "ant-man",
    "and",
    "the",
    "wasp"
  ]
}

POST movies/_doc/3
{
  "name": "Avengers: Infinity War Part 2",
  "completion_terms": [
    "avangers",
    "infinity",
    "war",
    "part",
    "2"
  ]
}

POST movies/_doc/4
{
  "name": "Captain Marvel",
  "completion_terms": [
    "captain",
    "marvel"
  ]
}

POST movies/_doc/5
{
  "name": "Black Panther",
  "completion_terms": [
    "black",
    "panther"
  ]
}

POST movies/_doc/6
{
  "name": "Avengers: Infinity War",
  "completion_terms": [
    "avangers",
    "infinity",
    "war"
  ]
}

POST movies/_doc/7
{
  "name": "Thor: Ragnarok",
  "completion_terms": [
    "thor",
    "ragnarok"
  ]
}

POST movies/_doc/8
{
  "name": "Guardians of the Galaxy Vol 2",
  "completion_terms": [
    "guardians",
    "of",
    "the",
    "galaxy",
    "vol",
    "2"
  ]
}

POST movies/_doc/9
{
  "name": "Doctor Strange",
  "completion_terms": [
    "doctor",
    "strange"
  ]
}

POST movies/_doc/10
{
  "name": "Captain America: Civil War",
  "completion_terms": [
    "captain",
    "america",
    "civil",
    "war"
  ]
}

POST movies/_doc/11
{
  "name": "Ant-Man",
  "completion_terms": [
    "ant-man"
  ]
}

POST movies/_doc/12
{
  "name": "Avengers: Age of Ultron",
  "completion_terms": [
    "avangers",
    "age",
    "of",
    "ultron"
  ]
}

POST movies/_doc/13
{
  "name": "Guardians of the Galaxy",
  "completion_terms": [
    "guardians",
    "of",
    "the",
    "galaxy"
  ]
}

POST movies/_doc/14
{
  "name": "Captain America: The Winter Soldier",
  "completion_terms": [
    "captain",
    "america",
    "the",
    "winter",
    "solider"
  ]
}

POST movies/_doc/15
{
  "name": "Thor: The Dark World",
  "completion_terms": [
    "thor",
    "the",
    "dark",
    "world"
  ]
}

POST movies/_doc/16
{
  "name": "Iron Man 3",
  "completion_terms": [
    "iron",
    "man",
    "3"
  ]
}

POST movies/_doc/17
{
  "name": "Marvel’s The Avengers",
  "completion_terms": [
    "marvels",
    "the",
    "avangers"
  ]
}

POST movies/_doc/18
{
  "name": "Captain America: The First Avenger",
  "completion_terms": [
    "captain",
    "america",
    "the",
    "first",
    "avanger"
  ]
}

POST movies/_doc/19
{
  "name": "Thor",
  "completion_terms": [
    "thor"
  ]
}

POST movies/_doc/20
{
  "name": "Iron Man 2",
  "completion_terms": [
    "iron",
    "man",
    "2"
  ]
}

POST movies/_doc/21
{
  "name": "The Incredible Hulk",
  "completion_terms": [
    "the",
    "incredible",
    "hulk"
  ]
}

POST movies/_doc/22
{
  "name": "Iron Man",
  "completion_terms": [
    "iron",
    "man"
  ]
}

和查询
POST movies/_search
{
  "suggest": {
    "movie-suggest-fuzzy": {
        "prefix": "avan",
        "completion": {
          "field": "name.completion",
          "fuzzy": {
            "fuzziness": 1
          }
      }
    }
  }
}

我的查询返回完整标题而不是片段。

最佳答案

这是一个很好的问题,表明您已经进行了很多研究才能使它起作用,但是您不必使其变得复杂(通过尝试在ES中完全处理它),我具有完全相同的用例并使用应用程序侧逻辑与ES的结合。

您真正需要的是您提到的对(n-1)个词的匹配查询和对第n个搜索词的前缀查询,如果以aven作为第一个和第n个词,则前缀查询将位于其上,如果是avengers inf搜索词,avengers将在匹配查询中,而前缀将在inf词上。

我只是索引了您提供的文档,并尝试了提到的两个搜索词,它均有效:

索引创建

{
    "mappings": {
        "properties": {
            "name": {
                "type": "text"
            }
        }
    }
}

索引3个文档
{
  "name" : "Avengers: Age of Ultron"
},
{
  "name" : "Avengers: Infinity War Part 2"
},
{
  "name" : "Avengers: Infinity War"
}

搜索查询
{
    "query": {
        "bool": {
            "must": [
                {
                    "match": {   --> Note match queries on (n-1) terms
                        "name": "avengers"
                    }
                },
                {
                    "prefix": {  --> Prefix query on nth term
                        "name": "ag"
                    }
                }
            ]
        }
    }
}

基本上,在您的应用程序代码中,您需要基于空格拆分搜索词,然后使用(n-1)个词的match子句和第n个词的前缀查询构造 bool(boolean) 查询。

请注意,在建立索引时甚至不需要使用边缘n-gram分析器和其他复杂的东西,这样可以节省索引中的很多空间,但是您可能想对前缀查询设置字符限制,因为在搜索时可能会花费很多钱数以百万计的文档,因为它不是 token 匹配的标记,因为它在匹配查询中存在。

关于elasticsearch - 长期自动完成Elasticsearch,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/59663048/

10-16 12:29