elasticsearch - 为什么 Elasticsearch 发现不区分大小写

我有这个索引:
"analysis" : { "filter" : { "meeteor_ngram" : { "type" : "nGram", "min_gram" : "2", "max_gram" : "15" } }, "analyzer" : { "meeteor" : { "filter" : [ "meeteor_ngram" ], "tokenizer" : "standard" } } },
和这个文件:
{ "_index" : "test_global_search", "_type" : "meeting", "_id" : "1", "_version" : 1, "found" : true, "_source" : { "name" : "LightBulb Innovation", "purpose" : "The others should listen the Innovators and also improve the current process.", "location" : "Projector should be set up.", "meeting_notes" : [ { "meeting_note_text" : "The immovator proposed to change the Bulb to Led." } ], "agenda_items" : [ { "text" : "Discuss The Lightning" } ] }}
尽管我没有进行小写过滤或标记化，但是这两个查询都返回了文档:

curl -XGET 'localhost:9200/global_search/meeting/_search?pretty' -H 'Content-Type: application/json' -d'
{
    "query": {
        "match": {
            "name": "lightbulb"
        }
    }
}
'

curl -XGET 'localhost:9200/global_search/meeting/_search?pretty' -H 'Content-Type: application/json' -d'
{
    "query": {
        "match": {
            "name": "Lightbulb"
        }
    }
}
'

这是映射:

→ curl -XGET 'localhost:9200/global_search/_mapping/meeting?pretty'
{
  "global_search" : {
    "mappings" : {
      "meeting" : {
        "properties" : {
          "agenda_items" : {
            "properties" : {
              "text" : {
                "type" : "text",
                "analyzer" : "meeteor"
              }
            }
          },
          "location" : {
            "type" : "text",
            "analyzer" : "meeteor"
          },
          "meeting_notes" : {
            "properties" : {
              "meeting_note_text" : {
                "type" : "text",
                "analyzer" : "meeteor"
              }
            }
          },
          "name" : {
            "type" : "text",
            "analyzer" : "meeteor"
          },
          "purpose" : {
            "type" : "text",
            "analyzer" : "meeteor"
          }
        }
      }
    }
  }
}

最佳答案

由于创建的LightBulb，lightBulb和custom analyzer都将返回您的文档。

检查分析仪如何标记数据。

GET global_search/_analyze?analyzer=meeteor
{
   "text" : "LightBulb Innovation"
}

您将看到以下输出:

{
 "tokens": [
  {
     "token": "Li",
     "start_offset": 0,
     "end_offset": 9,
     "type": "word",
     "position": 0
  },
  {
     "token": "Lig",
     "start_offset": 0,
     "end_offset": 9,
     "type": "word",
     "position": 0
  },
  {
     "token": "Ligh",
     "start_offset": 0,
     "end_offset": 9,
     "type": "word",
     "position": 0
  },
  {
     "token": "Light",
     "start_offset": 0,
     "end_offset": 9,
     "type": "word",
     "position": 0
  },
 .... other terms starting from Light

   {
     "token": "ig",      ======> tokens below this get matched when you run your query
     "start_offset": 0,
     "end_offset": 9,
     "type": "word",
     "position": 0
  },
  {
     "token": "igh",
     "start_offset": 0,
     "end_offset": 9,
     "type": "word",
     "position": 0
  },
  {
     "token": "ight",
     "start_offset": 0,
     "end_offset": 9,
     "type": "word",
     "position": 0
  },
  .... other tokens.

现在，当您运行match查询时，相同的custom analyzer行为并标记了您以上述方式搜索的文本。以及像'ig' , 'igh'这样的 token 以及更多其他 token 都可以匹配。这就是match似乎不起作用的原因。

在term查询的情况下，没有搜索分析器起作用。它按原样搜索术语。如果您搜索LightBulb，则会在 token 中找到它。但找不到lightBulb。

希望这可以澄清您的问题。

关于term和match的研究。

关于elasticsearch - 为什么 Elasticsearch 发现不区分大小写，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/43813734/