我有这个索引:"analysis" : { "filter" : { "meeteor_ngram" : { "type" : "nGram", "min_gram" : "2", "max_gram" : "15" } }, "analyzer" : { "meeteor" : { "filter" : [ "meeteor_ngram" ], "tokenizer" : "standard" } } },
和这个文件:{ "_index" : "test_global_search", "_type" : "meeting", "_id" : "1", "_version" : 1, "found" : true, "_source" : { "name" : "LightBulb Innovation", "purpose" : "The others should listen the Innovators and also improve the current process.", "location" : "Projector should be set up.", "meeting_notes" : [ { "meeting_note_text" : "The immovator proposed to change the Bulb to Led." } ], "agenda_items" : [ { "text" : "Discuss The Lightning" } ] }}
尽管我没有进行小写过滤或标记化,但是这两个查询都返回了文档:
curl -XGET 'localhost:9200/global_search/meeting/_search?pretty' -H 'Content-Type: application/json' -d'
{
"query": {
"match": {
"name": "lightbulb"
}
}
}
'
curl -XGET 'localhost:9200/global_search/meeting/_search?pretty' -H 'Content-Type: application/json' -d'
{
"query": {
"match": {
"name": "Lightbulb"
}
}
}
'
这是映射:
→ curl -XGET 'localhost:9200/global_search/_mapping/meeting?pretty'
{
"global_search" : {
"mappings" : {
"meeting" : {
"properties" : {
"agenda_items" : {
"properties" : {
"text" : {
"type" : "text",
"analyzer" : "meeteor"
}
}
},
"location" : {
"type" : "text",
"analyzer" : "meeteor"
},
"meeting_notes" : {
"properties" : {
"meeting_note_text" : {
"type" : "text",
"analyzer" : "meeteor"
}
}
},
"name" : {
"type" : "text",
"analyzer" : "meeteor"
},
"purpose" : {
"type" : "text",
"analyzer" : "meeteor"
}
}
}
}
}
}
最佳答案
由于创建的LightBulb
,lightBulb
和custom analyzer
都将返回您的文档。
检查分析仪如何标记数据。
GET global_search/_analyze?analyzer=meeteor
{
"text" : "LightBulb Innovation"
}
您将看到以下输出:
{
"tokens": [
{
"token": "Li",
"start_offset": 0,
"end_offset": 9,
"type": "word",
"position": 0
},
{
"token": "Lig",
"start_offset": 0,
"end_offset": 9,
"type": "word",
"position": 0
},
{
"token": "Ligh",
"start_offset": 0,
"end_offset": 9,
"type": "word",
"position": 0
},
{
"token": "Light",
"start_offset": 0,
"end_offset": 9,
"type": "word",
"position": 0
},
.... other terms starting from Light
{
"token": "ig", ======> tokens below this get matched when you run your query
"start_offset": 0,
"end_offset": 9,
"type": "word",
"position": 0
},
{
"token": "igh",
"start_offset": 0,
"end_offset": 9,
"type": "word",
"position": 0
},
{
"token": "ight",
"start_offset": 0,
"end_offset": 9,
"type": "word",
"position": 0
},
.... other tokens.
现在,当您运行
match
查询时,相同的custom analyzer
行为并标记了您以上述方式搜索的文本。以及像'ig' , 'igh'
这样的 token 以及更多其他 token 都可以匹配。这就是match
似乎不起作用的原因。在
term
查询的情况下,没有搜索分析器起作用。它按原样搜索术语。如果您搜索LightBulb
,则会在 token 中找到它。但找不到lightBulb
。希望这可以澄清您的问题。
关于term和match的研究。
关于elasticsearch - 为什么 Elasticsearch 发现不区分大小写,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/43813734/