我有一个标题如下的小测试规:
' Santos(SAN)XAvaíFC(AVAI)-Canindé(圣保罗)'
如果我尝试搜索“ san ”或“ avai ”或“avaí”或“ santos ”之类的词,则效果很好。但是,如果我尝试搜索“ Santos(SAN)XAvaíFC(AVAI)”,则它应该仅返回1 reg,并且返回所有3 reg。
这是测试数据:
https://gist.github.com/PtkFerraro/83c4b693cf770c3320fe0530a4e1ddc7
这是分析仪和映射
https://gist.github.com/PtkFerraro/eb3244bf8c589b234a13d7f2b693cf77
搜索如下:
https://gist.github.com/PtkFerraro/c0f8ed300566cce3b5118fff1522a421
提前致谢
{
"settings": {
"analysis": {
"analyzer": {
"title_default_analyzer": {
"type": "custom",
"tokenizer": "title_tokenizer",
"filter": ["lowercase","brazilian_filter","asciifolding"
]
},
"title_snowball_analyzer": {
"type": "custom",
"tokenizer": "title_tokenizer",
"filter": ["lowercase","brazilian_filter","asciifolding","snowball"
]
},
"title_shingle_analyzer": {
"type": "custom",
"tokenizer": "title_tokenizer",
"filter": ["lowercase","brazilian_filter","shingle","asciifolding"
]
},
"title_ngram_analyzer": {
"type": "custom",
"tokenizer": "title_tokenizer",
"filter": ["lowercase","brazilian_filter","asciifolding","edge_ngram_filter"
]
},
"title_search_analyzer": {
"type": "custom",
"tokenizer": "title_tokenizer",
"filter": ["lowercase","brazilian_filter","asciifolding"
]
}
},
"filter": {
"brazilian_filter": {
"type": "stemmer",
"name": "brazilian",
"token_chars": ["letter", "digit"]
},
"edge_ngram_filter": {
"type": "edgeNGram",
"min_gram": 3,
"max_gram": 50,
"token_chars": ["letter", "digit"]
}
},
"tokenizer": {
"title_tokenizer": {
"type": "letter"
}
}
}
},
"mappings": {
"entersport": {
"_all": {
"enabled": false
},
"properties": {
"is_adult": {
"type": "boolean"
},
"match_start": {
"type": "date"
},
"match_title": {
"type": "text",
"fields": {
"title": {
"type": "text",
"analyzer": "title_default_analyzer"
},
"snowball": {
"type": "text",
"analyzer": "title_snowball_analyzer"
},
"shingles": {
"type": "text",
"analyzer": "title_shingle_analyzer"
},
"ngrams": {
"type": "text",
"analyzer": "title_ngram_analyzer",
"search_analyzer": "title_search_analyzer"
}
}
}
}
}
}
}
最佳答案
看来您想支持两种不同的搜索:(1)将san
或avai
这样的单个词匹配到足球比赛标题,以及(2)精确词组匹配部分或全部比赛标题。
您正在使用的query string query被解析为单独的术语,因此在您的示例中Santos (SAN) X Avaí FC (AVAI)
包含X
token ,该 token 与所有3个示例文档匹配,因为它们每个都包含X
。
您的查询:
{
"query_string": {
"query": "Santos (SAN) X Avaí FC (AVAI)",
"fields": [
"title^10",
"match_title.snowball^2",
"match_title.shingles^2",
"match_title.ngrams"
]
}
}
如果要匹配整个短语,则需要使用查询来完成。评论中建议使用Match phrase queries。您还可以使用匹配查询,将
operator
设置为and
查询中的术语。这将允许您使用相同的查询类型来匹配avai
和Santos (SAN) X Avaí FC (AVAI)
。我认为您可能更喜欢以下内容:
{
"query": {
"multi_match" : {
"query": "Santos (SAN) X Avaí FC (AVAI)",
"fields": [
"title^10",
"match_title.snowball^2",
"match_title.shingles^2",
"match_title.ngrams"
],
"operator": "and"
}
}
}
关于elasticsearch - Elasticsearch多个分析器不起作用,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/43902570/