我有Hcc18,HCC23,I23等字母数字代码,我想将它们存储在 ElasticSearch 中。为此,我要构建以下两个功能:
示例:
对于hcc15或15, hcc15 应该在输出中,并在结果的顶部。
我的Elasticsearch当前映射为:
"mappings": {
"properties": {
"code": {
"type": "text",
"analyzer": "autoanalyer"
}
}
},
"settings": {
"analysis": {
"analyzer": {
"autoanalyer": {
"tokenizer": "standard",
"filter": [
"lowercase",
]
}
},
"tokenizer": {
"autotoken": {
"type": "simple_pattern",
"pattern": "[0-9]+"
}
}
}
}
正在查询:
{
"min_score": 0.1,
"from": 0,
"size": 10000,
"query": {
"bool": {
"should": [{ "match": {"code": search_term}}]
}
}
}
我用这种方法面临的两个问题是:
我正在获取与编号420相关的所有代码,但确切的
匹配I420不在顶部。
自动完成功能。
最佳答案
您有多个要求,而所有这些都可以通过使用
以下是使用OP数据和查询的分步示例。
索引定义
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "autotoken" -->used your analyzer to extract numbers
}
},
"tokenizer": {
"autotoken": {
"type": "simple_pattern",
"pattern": "[0-9]+",
"preserve_original": true
}
}
}
},
"mappings": {
"properties": {
"code": {
"type": "keyword",
"fields": {
"number": {
"type": "text",
"analyzer" : "my_analyzer"
}
}
}
}
}
}
索引一些文档
{
"code" : "hcc420"
}
{
"code" : "HCC23"
}
{
"code" : "I23"
}
{
"code" : "I420"
}
{
"code" : "I421"
}
{
"code" : "hcc420"
}
搜索查询(问题1,搜索
I420
,应在示例数据I420
和hcc420
中带入2个文档,但I420
的完全匹配分数必须更高){
"query": {
"bool": {
"should": [
{
"prefix": {
"code": {
"value": "I420"
}
}
},
{
"match": {
"code.number": "I420"
}
}
]
}
}
}
结果
"hits": [
{
"_index": "so_number",
"_type": "_doc",
"_id": "4",
"_score": 2.0296195, --> note exact match having high score
"_source": {
"code": "I420"
}
},
{
"_index": "so_number",
"_type": "_doc",
"_id": "7",
"_score": 1.0296195,
"_source": {
"code": "hcc420"
}
}
]
第2部分:可以使用同一搜索查询的自动完成功能
因此,搜索
I42
必须从示例文档中获取I420
和I421
{
"query": {
"bool": {
"should": [
{
"prefix": {
"code": {
"value": "I42"
}
}
},
{
"match": {
"code.number": "I42"
}
}
]
}
}
}
结果
"hits": [
{
"_index": "so_number",
"_type": "_doc",
"_id": "4",
"_score": 1.0,
"_source": {
"code": "I420"
}
},
{
"_index": "so_number",
"_type": "_doc",
"_id": "5",
"_score": 1.0,
"_source": {
"code": "I421"
}
}
]
让我们再举一个数字搜索的例子,搜索
420
必须带hcc420
和I420
搜索查询
{
"query": {
"bool": {
"should": [
{
"prefix": {
"code": {
"value": "420"
}
}
},
{
"match": {
"code.number": "420"
}
}
]
}
}
}
And whoa, again it gave expected results 😀
Result
------
"hits": [
{
"_index": "so_number",
"_type": "_doc",
"_id": "4",
"_score": 1.0296195,
"_source": {
"code": "I420"
}
},
{
"_index": "so_number",
"_type": "_doc",
"_id": "7",
"_score": 1.0296195,
"_source": {
"code": "hcc420"
}
}
]
关于regex - ElasticSearch Analyzer自动完成功能,用于字母数字,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/60975192/