我正在使用Elasticsearch 7.2.0,我想创建搜索建议。
例如,我有以下3个电影标题:
当我输入“ aven ”时,应返回如下建议:
aven gers
aven gers infinity
aven gers年龄
当我输入“ avengers inf ”时
复仇者inf inity war
复仇者inf inity war第2部分
经过大量的elasticsearch教程之后,我做到了:
Check Cluster
PUT movies
{
"settings": {
"index": {
"analysis": {
"filter": {},
"analyzer": {
"keyword_analyzer": {
"filter": [
"lowercase",
"asciifolding",
"trim"
],
"char_filter": [],
"type": "custom",
"tokenizer": "keyword"
},
"edge_ngram_analyzer": {
"filter": [
"lowercase"
],
"tokenizer": "edge_ngram_tokenizer"
},
"edge_ngram_search_analyzer": {
"tokenizer": "lowercase"
},
"completion_analyzer": {
"tokenizer": "keyword",
"filter": "lowercase"
}
},
"tokenizer": {
"edge_ngram_tokenizer": {
"type": "edge_ngram",
"min_gram": 2,
"max_gram": 5,
"token_chars": [
"letter"
]
}
}
}
}
},
"mappings": {
"properties": {
"name": {
"type": "text",
"fields": {
"keywordstring": {
"type": "text",
"analyzer": "keyword_analyzer"
},
"edgengram": {
"type": "text",
"analyzer": "edge_ngram_analyzer",
"search_analyzer": "edge_ngram_search_analyzer"
},
"completion": {
"type": "completion"
}
},
"analyzer": "standard"
},
"completion_terms": {
"type": "text",
"fielddata": true,
"analyzer": "completion_analyzer"
}
}
}
}
以下文档:
POST movies/_doc/1
{
"name": "Spider-Man: Homecoming",
"completion_terms": [
"spider-man",
"homecomming"
]
}
POST movies/_doc/2
{
"name": "Ant-man and the Wasp",
"completion_terms": [
"ant-man",
"and",
"the",
"wasp"
]
}
POST movies/_doc/3
{
"name": "Avengers: Infinity War Part 2",
"completion_terms": [
"avangers",
"infinity",
"war",
"part",
"2"
]
}
POST movies/_doc/4
{
"name": "Captain Marvel",
"completion_terms": [
"captain",
"marvel"
]
}
POST movies/_doc/5
{
"name": "Black Panther",
"completion_terms": [
"black",
"panther"
]
}
POST movies/_doc/6
{
"name": "Avengers: Infinity War",
"completion_terms": [
"avangers",
"infinity",
"war"
]
}
POST movies/_doc/7
{
"name": "Thor: Ragnarok",
"completion_terms": [
"thor",
"ragnarok"
]
}
POST movies/_doc/8
{
"name": "Guardians of the Galaxy Vol 2",
"completion_terms": [
"guardians",
"of",
"the",
"galaxy",
"vol",
"2"
]
}
POST movies/_doc/9
{
"name": "Doctor Strange",
"completion_terms": [
"doctor",
"strange"
]
}
POST movies/_doc/10
{
"name": "Captain America: Civil War",
"completion_terms": [
"captain",
"america",
"civil",
"war"
]
}
POST movies/_doc/11
{
"name": "Ant-Man",
"completion_terms": [
"ant-man"
]
}
POST movies/_doc/12
{
"name": "Avengers: Age of Ultron",
"completion_terms": [
"avangers",
"age",
"of",
"ultron"
]
}
POST movies/_doc/13
{
"name": "Guardians of the Galaxy",
"completion_terms": [
"guardians",
"of",
"the",
"galaxy"
]
}
POST movies/_doc/14
{
"name": "Captain America: The Winter Soldier",
"completion_terms": [
"captain",
"america",
"the",
"winter",
"solider"
]
}
POST movies/_doc/15
{
"name": "Thor: The Dark World",
"completion_terms": [
"thor",
"the",
"dark",
"world"
]
}
POST movies/_doc/16
{
"name": "Iron Man 3",
"completion_terms": [
"iron",
"man",
"3"
]
}
POST movies/_doc/17
{
"name": "Marvel’s The Avengers",
"completion_terms": [
"marvels",
"the",
"avangers"
]
}
POST movies/_doc/18
{
"name": "Captain America: The First Avenger",
"completion_terms": [
"captain",
"america",
"the",
"first",
"avanger"
]
}
POST movies/_doc/19
{
"name": "Thor",
"completion_terms": [
"thor"
]
}
POST movies/_doc/20
{
"name": "Iron Man 2",
"completion_terms": [
"iron",
"man",
"2"
]
}
POST movies/_doc/21
{
"name": "The Incredible Hulk",
"completion_terms": [
"the",
"incredible",
"hulk"
]
}
POST movies/_doc/22
{
"name": "Iron Man",
"completion_terms": [
"iron",
"man"
]
}
和查询
POST movies/_search
{
"suggest": {
"movie-suggest-fuzzy": {
"prefix": "avan",
"completion": {
"field": "name.completion",
"fuzzy": {
"fuzziness": 1
}
}
}
}
}
我的查询返回完整标题而不是片段。
最佳答案
这是一个很好的问题,表明您已经进行了很多研究才能使它起作用,但是您不必使其变得复杂(通过尝试在ES中完全处理它),我具有完全相同的用例并使用应用程序侧逻辑与ES的结合。
您真正需要的是您提到的对(n-1)个词的匹配查询和对第n个搜索词的前缀查询,如果以aven作为第一个和第n个词,则前缀查询将位于其上,如果是avengers inf
搜索词,avengers
将在匹配查询中,而前缀将在inf
词上。
我只是索引了您提供的文档,并尝试了提到的两个搜索词,它均有效:
索引创建
{
"mappings": {
"properties": {
"name": {
"type": "text"
}
}
}
}
索引3个文档
{
"name" : "Avengers: Age of Ultron"
},
{
"name" : "Avengers: Infinity War Part 2"
},
{
"name" : "Avengers: Infinity War"
}
搜索查询
{
"query": {
"bool": {
"must": [
{
"match": { --> Note match queries on (n-1) terms
"name": "avengers"
}
},
{
"prefix": { --> Prefix query on nth term
"name": "ag"
}
}
]
}
}
}
基本上,在您的应用程序代码中,您需要基于空格拆分搜索词,然后使用(n-1)个词的match子句和第n个词的前缀查询构造 bool(boolean) 查询。
请注意,在建立索引时甚至不需要使用边缘n-gram分析器和其他复杂的东西,这样可以节省索引中的很多空间,但是您可能想对前缀查询设置字符限制,因为在搜索时可能会花费很多钱数以百万计的文档,因为它不是 token 匹配的标记,因为它在匹配查询中存在。
关于elasticsearch - 长期自动完成Elasticsearch,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/59663048/