本文介绍了Elasticsearch“pattern_replace",分析时替换空格的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

基本上我想删除所有空格并将整个字符串标记为单个标记.(稍后我将在此基础上使用 nGram.)

Basically I want to remove all whitespaces and tokenize the whole string as a single token. (I will use nGram on top of that later on.)

这是我的索引设置:

"settings": {
 "index": {
  "analysis": {
    "filter": {
      "whitespace_remove": {
        "type": "pattern_replace",
        "pattern": " ",
        "replacement": ""
      }
    },
    "analyzer": {
      "meliuz_analyzer": {
        "filter": [
          "lowercase",
          "whitespace_remove"
        ],
        "type": "custom",
        "tokenizer": "standard"
      }
    }
  }
}

我也尝试了 "pattern": "\u0020"\s 而不是 "pattern": " ".

Instead of "pattern": " ", I tried "pattern": "\u0020" and \s , too.

但是当我分析文本beleza na web"时,它仍然创建了三个单独的标记:beleza"、na"和web",而不是一个单独的belezanaweb".

But when I analyze the text "beleza na web", it still creates three separate tokens: "beleza", "na" and "web", instead of one single "belezanaweb".

推荐答案

分析器通过先对字符串进行标记然后应用一系列标记过滤器来分析字符串.您已将分词器指定为标准意味着输入已使用 standard 分词tokenizer 分别创建了令牌.然后将模式替换过滤器应用于标记.

The analyzer analyzes a string by tokenizing it first then applying a series of token filters. You have specified tokenizer as standard means the input is already tokenized using standard tokenizer which created the tokens separately. Then pattern replace filter is applied to the tokens.

使用 keyword tokenizer 而不是您的标准标记器.其余的映射很好.您可以按如下方式更改映射

Use keyword tokenizer instead of your standard tokenizer. Rest of the mapping is fine.You can change your mapping as below

"settings": {
 "index": {
  "analysis": {
    "filter": {
      "whitespace_remove": {
        "type": "pattern_replace",
        "pattern": " ",
        "replacement": ""
      }
    },
    "analyzer": {
      "meliuz_analyzer": {
        "filter": [
          "lowercase",
          "whitespace_remove",
          "nGram"
        ],
        "type": "custom",
        "tokenizer": "keyword"
      }
    }
  }
}

这篇关于Elasticsearch“pattern_replace",分析时替换空格的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-15 16:39