如果同义词是多词表达而 token 化器是空白,那么有人可以解释一下同义词标记过滤器的工作原理吗?例如。如果我有这个简单的映射

PUT /test_index
{
    "settings": {
        "index" : {
            "analysis" : {
                "analyzer" : {
                    "synonym" : {
                        "tokenizer" : "whitespace",
                        "filter" : ["synonym"]
                    }
                },
                "filter" : {
                    "synonym_graph" : {
                        "type" : "synonym",
                        "lenient": true,
                        "synonyms" : ["multi word, bar => baz"]
                    }
                }
            }
        }
    }
}

我不明白如果空格标记生成器将其分解为两个单词 multi和word,则如何评估术语多单词。因此,据我了解,同义词过滤器永远不会将“多字”作为在配置中查找同义词的一个术语。任何帮助表示赞赏。

最佳答案

答案可以在本节中找到
https://www.elastic.co/guide/en/elasticsearch/reference/7.6/token-graphs.html
和这篇博客文章
http://blog.mikemccandless.com/2012/04/lucenes-tokenstreams-are-actually.html

The following token filters can add tokens that span multiple positions but only record a default positionLength of 1:

- synonym
- word_delimiter

This means these filters will produce invalid token graphs for streams containing such tokens.

Avoid using invalid token graphs for search. Invalid graphs can cause unexpected search results.

关于elasticsearch - 如果同义词是多词,Elasticsearch如何使用同义词标记过滤器?,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/61467304/

10-11 20:15