我正在使用Java API从 Elasticsearch 中获取文档,我的 Elasticsearch 文档中包含以下code,并尝试使用以下模式进行搜索。
code : MS-VMA1615-0D

Input : *VMA1615-0*     -- Am getting the results (MS-VMA1615-0D).
Input : MS-VMA1615-0D   -- Am getting the results (MS-VMA1615-0D).
Input : *VMA1615-0      -- Am getting the results (MS-VMA1615-0D).
Input : *VMA*-0*        -- Am getting the results (MS-VMA1615-0D).

但是,如果我像下面这样输入,就不会得到结果。
Input : VMA1615         -- Am not getting the results.

期望返回代码MS-VMA1615-0D
请找到我下面正在使用的Java代码
private final String INDEX = "products";
private final String TYPE = "doc";
SearchRequest searchRequest = new SearchRequest(INDEX);
    searchRequest.types(TYPE);
    SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
    QueryStringQueryBuilder qsQueryBuilder = new QueryStringQueryBuilder(code);

    qsQueryBuilder.defaultField("code");
    searchSourceBuilder.query(qsQueryBuilder);

    searchSourceBuilder.size(50);
    searchRequest.source(searchSourceBuilder);
    SearchResponse searchResponse = null;
    try {
         searchResponse = SearchEngineClient.getInstance().search(searchRequest);
    } catch (IOException e) {
        e.getLocalizedMessage();
    }
    Item item = null;
    SearchHit[] searchHits = searchResponse.getHits().getHits();

请找到我的 map 详细信息:
PUT products
{
"settings": {
"analysis": {
  "analyzer": {
    "custom_analyzer": {
      "type": "custom",
      "tokenizer": "whitespace",
      "char_filter": [
        "html_strip"
      ],
      "filter": [
        "lowercase",
        "asciifolding"
      ]
    }
   }
  }
},
"mappings": {
"doc": {
  "properties": {
    "code": {
      "type": "text",
       "analyzer": "custom_analyzer"
      }
       }
  }
 }
}

最佳答案

要执行您想要的操作,您可能必须更改您使用的 token 生成器。当前,您正在使用空白标记生成器,必须将其替换为模式标记生成器。
因此,您的新映射应如下图所示:

PUT products
{
"settings": {
"analysis": {
  "analyzer": {
    "custom_analyzer": {
      "type": "custom",
      "tokenizer": "pattern",
      "char_filter": [
        "html_strip"
      ],
      "filter": [
        "lowercase",
        "asciifolding"
      ]
    }
   }
  }
},
"mappings": {
"doc": {
  "properties": {
    "code": {
      "type": "text",
       "analyzer": "custom_analyzer"
      }
    }
  }
 }
}

因此,在更改映射之后,对 VMA1615的查询将返回 MS-VMA1615-0D 。

这是因为它将字符串“MS-VMA1615-0D”标记为“MS”,“VMA1615”和“0D”。因此,只要您有任何查询,它将为您提供结果。
POST _analyze
{
  "tokenizer": "pattern",
  "text": "MS-VMA1615-0D"
}

将返回:
{
  "tokens": [
    {
      "token": "MS",
      "start_offset": 0,
      "end_offset": 2,
      "type": "word",
      "position": 0
    },
    {
      "token": "VMA1615",
      "start_offset": 3,
      "end_offset": 10,
      "type": "word",
      "position": 1
    },
    {
      "token": "0D",
      "start_offset": 11,
      "end_offset": 13,
      "type": "word",
      "position": 2
    }
  ]
}

根据您的评论:



为此,请使用以下映射:
PUT products
{
"settings": {
"analysis": {
  "analyzer": {
    "custom_analyzer": {
      "type": "custom",
      "tokenizer": "my_pattern_tokenizer",
      "char_filter": [
        "html_strip"
      ],
      "filter": [
        "lowercase",
        "asciifolding"
      ]
    }
   },
   "tokenizer": {
     "my_pattern_tokenizer": {
          "type": "pattern",
          "pattern": "-|\\d"
        }
   }
  }
},
"mappings": {
"doc": {
  "properties": {
    "code": {
      "type": "text",
       "analyzer": "custom_analyzer"
      }
    }
  }
 }
}

因此要检查:
POST products/_analyze
{
  "tokenizer": "my_pattern_tokenizer",
  "text": "MS-VMA1615-0D"
}

将产生:
{
  "tokens": [
    {
      "token": "MS",
      "start_offset": 0,
      "end_offset": 2,
      "type": "word",
      "position": 0
    },
    {
      "token": "VMA",
      "start_offset": 3,
      "end_offset": 6,
      "type": "word",
      "position": 1
    },
    {
      "token": "D",
      "start_offset": 12,
      "end_offset": 13,
      "type": "word",
      "position": 2
    }
  ]
}

09-27 14:56