本文介绍了ElasticSearch-输入查询中不带(*)的JavaApi搜索不会发生的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用Java api从弹性搜索中获取文档时,我的弹性搜索文档中包含以下代码,并尝试使用以下模式进行搜索。

Am fetching documents from elastic search using java api, i have the following code in my elastic search documents and am trying to search it with the following pattern.

代码:MS-VMA1615-0D

Input : *VMA1615-0*     -- Am getting the results (MS-VMA1615-0D).
Input : MS-VMA1615-0D   -- Am getting the results (MS-VMA1615-0D).
Input : *VMA1615-0      -- Am getting the results (MS-VMA1615-0D).
Input : *VMA*-0*        -- Am getting the results (MS-VMA1615-0D).

但是,如果我输入以下内容,则不会得到结果。

But, if i give input like below, am not getting results.

Input : VMA1615         -- Am not getting the results.

我希望返回代码 MS-VMA1615-0D

请在下面找到我正在使用的Java代码

Please find my below java code that am using

private final String INDEX = "products";
private final String TYPE = "doc";
SearchRequest searchRequest = new SearchRequest(INDEX);
    searchRequest.types(TYPE);
    SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
    QueryStringQueryBuilder qsQueryBuilder = new QueryStringQueryBuilder(code);

    qsQueryBuilder.defaultField("code");
    searchSourceBuilder.query(qsQueryBuilder);

    searchSourceBuilder.size(50);
    searchRequest.source(searchSourceBuilder);
    SearchResponse searchResponse = null;
    try {
         searchResponse = SearchEngineClient.getInstance().search(searchRequest);
    } catch (IOException e) {
        e.getLocalizedMessage();
    }
    Item item = null;
    SearchHit[] searchHits = searchResponse.getHits().getHits();

请找到我的地图详细信息:

Please find my mapping details :

PUT products
{
"settings": {
"analysis": {
  "analyzer": {
    "custom_analyzer": {
      "type": "custom",
      "tokenizer": "whitespace",
      "char_filter": [
        "html_strip"
      ],
      "filter": [
        "lowercase",
        "asciifolding"
      ]
    }
   }
  }
},
"mappings": {
"doc": {
  "properties": {
    "code": {
      "type": "text",
       "analyzer": "custom_analyzer"
      }
       }
  }
 }
}


推荐答案

要执行所需的操作,可能必须更改所使用的令牌生成器。当前,您正在使用空白标记生成器,必须将其替换为 pattern 标记生成器。
因此,您的新映射应如下图所示:

To do what you're looking for you might have to change the tokenizer you're using. Currently you are using whitespace tokenizer which must be replaced with pattern tokenizer.So your new mapping should look like the below one:

PUT products
{
"settings": {
"analysis": {
  "analyzer": {
    "custom_analyzer": {
      "type": "custom",
      "tokenizer": "pattern",
      "char_filter": [
        "html_strip"
      ],
      "filter": [
        "lowercase",
        "asciifolding"
      ]
    }
   }
  }
},
"mappings": {
"doc": {
  "properties": {
    "code": {
      "type": "text",
       "analyzer": "custom_analyzer"
      }
    }
  }
 }
}

因此,将映射更改为strong> VMA1615 将返回 MS-VMA1615-0D

So after changing your mapping a query to VMA1615 will return MS-VMA1615-0D.

此操作可用于标记字符串 MS-VMA1615-0D 转换为 MS, VMA1615和 0D。因此,只要您的查询中有任何一个,它将为您提供结果。

This works as it tokenize the string "MS-VMA1615-0D" into "MS", "VMA1615" & "0D". So, whenever in your query you have any of them it will give you the result.

POST _analyze
{
  "tokenizer": "pattern",
  "text": "MS-VMA1615-0D"
}

将返回:

{
  "tokens": [
    {
      "token": "MS",
      "start_offset": 0,
      "end_offset": 2,
      "type": "word",
      "position": 0
    },
    {
      "token": "VMA1615",
      "start_offset": 3,
      "end_offset": 10,
      "type": "word",
      "position": 1
    },
    {
      "token": "0D",
      "start_offset": 11,
      "end_offset": 13,
      "type": "word",
      "position": 2
    }
  ]
}

根据您的评论:

为此,请使用以下映射:

To do that use the below mapping:

PUT products
{
"settings": {
"analysis": {
  "analyzer": {
    "custom_analyzer": {
      "type": "custom",
      "tokenizer": "my_pattern_tokenizer",
      "char_filter": [
        "html_strip"
      ],
      "filter": [
        "lowercase",
        "asciifolding"
      ]
    }
   },
   "tokenizer": {
     "my_pattern_tokenizer": {
          "type": "pattern",
          "pattern": "-|\\d"
        }
   }
  }
},
"mappings": {
"doc": {
  "properties": {
    "code": {
      "type": "text",
       "analyzer": "custom_analyzer"
      }
    }
  }
 }
}

要检查:

POST products/_analyze
{
  "tokenizer": "my_pattern_tokenizer",
  "text": "MS-VMA1615-0D"
}

将产生:

{
  "tokens": [
    {
      "token": "MS",
      "start_offset": 0,
      "end_offset": 2,
      "type": "word",
      "position": 0
    },
    {
      "token": "VMA",
      "start_offset": 3,
      "end_offset": 6,
      "type": "word",
      "position": 1
    },
    {
      "token": "D",
      "start_offset": 12,
      "end_offset": 13,
      "type": "word",
      "position": 2
    }
  ]
}

这篇关于ElasticSearch-输入查询中不带(*)的JavaApi搜索不会发生的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

06-29 21:01