我一直在尝试使用ElasticSearch为应用程序构建搜索模块。以下是我从其他StackOverflow帖子中读取的示例代码构建的索引结构。
{
"megacorp4":{
"settings":{
"analysis":{
"analyzer":{
"my_analyzer":{
"type":"custom",
"tokenizer":"my_ngram_tokenizer",
"filter":[
"my_ngram_filter"
]
}
},
"filter":{
"my_ngram_filter":{
"type":"edgeNGram",
"min_gram":3,
"max_gram":15
}
},
"tokenizer":{
"my_ngram_tokenizer":{
"type":"edgeNGram",
"min_gram":3,
"max_gram":15
}
}
},
"mappings":{
"employee":{
"properties":{
"about":{
"type":"string",
"analyzer":"my_analyzer"
},
"age":{
"type":"long"
},
"first_name":{
"type":"string"
},
"interests":{
"type":"string",
"analyzer":"my_analyzer"
},
"last_name":{
"type":"string"
}
}
}
}
}
}
}
以下是我为测试搜索功能而插入的记录
[
{
"first_name":"John",
"last_name":"Smith",
"age":25,
"about":"I love to go rock climbing",
"interests":[
"sports",
"music"
]
},
{
"first_name":"Douglas",
"last_name":"Fir",
"age":35,
"about":"I like to build album climb cabinets",
"interests":[
"forestry",
"music"
]
},
{
"first_name":"Jane",
"last_name":"Smith",
"age":32,
"about":"I like to collect rock albums",
"interests":[
"music"
]
}
]
我使用API(通过POSTMAN)和Python客户端在“关于”列中进行了搜索,如下所示:
API查询:
localhost:9200/megacorp4/_search?q=climb
Python查询:
from elasticsearch import Elasticsearch
from pprint import pprint
es = Elasticsearch()
res = es.search(index="megacorp4", body={"query": {"match": {'about':"climb"}}})
pprint(res)
我只能获得完全匹配,而输出中没有“爬升”的结果。但是,当我在查询中将“climb”替换为“climb *”时,我获得了2条记录,分别是“climb”和“climbing”。我不想使用“*”通配符方法。
我也尝试使用“英语”,“标准”和“ngram”内置分析仪,但似乎没有任何效果。
需要帮助来实现将关键字搜索为全文中的部分单词。
提前致谢。
最佳答案
使用此映射代替:
删除测试
PUT /test
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"my_ngram_filter"
]
}
},
"filter": {
"my_ngram_filter": {
"type": "edgeNGram",
"min_gram": 3,
"max_gram": 15
}
}
}
},
"mappings": {
"employee": {
"properties": {
"about": {
"type": "string",
"analyzer": "my_analyzer"
},
"age": {
"type": "long"
},
"first_name": {
"type": "string"
},
"interests": {
"type": "string",
"analyzer": "my_analyzer"
},
"last_name": {
"type": "string"
}
}
}
}
}
POST /test/employee/_bulk
{"index":{}}
{"first_name":"John","last_name":"Smith","age":25,"about":"I love to go rock climbing","interests":["sports","music"]}
{"index":{}}
{"first_name":"Douglas","last_name":"Fir","age":35,"about":"I like to build album climb cabinets","interests":["forestry","music"]}
{"index":{}}
{"first_name":"Jane","last_name":"Smith","age":32,"about":"I like to collect rock albums","interests":["music"]}
GET /test/_search?q=about:climb
GET /test/_search
{
"query": {
"query_string": {
"query": "about:climb"
}
}
}
GET /test/_search
{
"query": {
"match": {
"about": "climb"
}
}
}
两项更改:
settings
部分standard
标记器对于
?q=climb
部分,默认情况下会搜索_all
字段,该字段是使用standard
分析器而不是您的自定义字段分析的。因此,正确的查询是
localhost:9200/megacorp4/_search?q=about:climb
。