问题描述
在 Elasticsearch 中,如何搜索任意子字符串,可能包括空格?(搜索单词的一部分是不够的;我想搜索整个字段的任何子字符串.)
In Elasticsearch, how do I search for an arbitrary substring, perhaps including spaces? (Searching for part of a word isn't quite enough; I want to search any substring of an entire field.)
我想它必须在 keyword
字段中,而不是 text
字段中.
I imagine it has to be in a keyword
field, rather than a text
field.
假设我的 Elasticsearch 索引中只有几千个文档,我尝试:
Suppose I have only a few thousand documents in my Elasticsearch index, and I try:
"query": {
"wildcard" : { "description" : "*plan*" }
}
这按预期工作——我得到了描述中包含计划"的所有项目,甚至像替代"这样的项目.
That works as expected--I get every item where "plan" is in the description, even ones like "supplantation".
现在,我想做
"query": {
"wildcard" : { "description" : "*plan is*" }
}
...这样我就可以在许多其他可能性中将文档与Kaplan is not"匹配起来.
...so that I might match documents with "Kaplan isn't" among many other possibilities.
对于通配符、匹配前缀或我可能会看到的任何其他查询类型,这似乎是不可能的.如何简单地搜索任何子字符串?(在 SQL 中,我只会做 description LIKE '%plan is%'
)
It seems this isn't possible with wildcard, match prefix, or any other query type I might see. How do I simply search on any substring? (In SQL, I would just do description LIKE '%plan is%'
)
(我知道对于大型数据集,任何此类查询都会很慢甚至不可能.)
(I am aware any such query would be slow or perhaps even impossible for large data sets.)
推荐答案
我希望这个 Elasticsearch 可能有内置的东西,因为这个简单的子字符串搜索似乎是一个非常基本的功能(仔细想想,它是在 C 中实现为 strstr()
,在 SQL 中实现为 LIKE '%%'
,在大多数文本编辑器中实现为 Ctrl+F,在 C# 中实现为 String.IndexOf
等),但情况似乎并非如此.请注意,正则表达式查询不支持不区分大小写,因此我还需要将其与此自定义分析器配对,以便索引匹配全小写.然后我也可以将我的搜索字符串转换为小写.
I was hoping there might be something built-in for this Elasticsearch, given that this simple substring search seems like a very basic capability (Thinking about it, it is implemented as strstr()
in C, LIKE '%%'
in SQL, Ctrl+F in most text editors, String.IndexOf
in C#, etc.), but this seems not to be the case. Note that the regexp query doesn't support case insensitivity, so I also needed to pair it with this custom analyzer, so that the index matches all-lowercase. Then I can convert my search string to lowercase as well.
{
"settings": {
"analysis": {
"analyzer": {
"lowercase_keyword": {
"type": "custom",
"tokenizer": "keyword",
"filter": [ "lowercase" ]
}
}
}
},
"mappings": {
...
"description": {"type": "text", "analyzer": "lowercase_keyword"},
}
}
示例查询:
"query": {
"regexp" : { "description" : ".*plan is.*" }
}
感谢 Jai Sharma 带领我;我只是想提供更多细节.
Thanks to Jai Sharma for leading me; I just wanted to provide more detail.
这篇关于在 Elasticsearch 中,如何搜索任意子字符串?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!