将folliwng book
编入索引:
curl -X PUT localhost:9200/books/book/1 -d '{
"title": "All Quiet on the Western Front",
"author": "Erich Maria Remarque",
"year": 1929,
}'
我正在尝试使用official docs的代码来实现短语建议器。
所以我尝试了;
curl -XPOST 'localhost:9200/books/_search' -d '{
"suggest" : {
"text" : "al quet",
"simple_phrase" : {
"phrase" : {
"analyzer" : "body",
"field" : "bigram",
"size" : 1,
"real_word_error_likelihood" : 0.95,
"max_errors" : 0.5,
"gram_size" : 2,
"direct_generator" : [ {
"field" : "title",
"suggest_mode" : "always",
"min_word_length" : 1
} ],
"highlight": {
"pre_tag": "<em>",
"post_tag": "</em>"
}
}
}
}
}'
我希望这可以从
al quet
纠正为all quiet
。但是我收到以下错误:
"error" : {
"root_cause" : [ {
"type" : "illegal_argument_exception",
"reason" : "Analyzer [body] doesn't exists"
如果我将
"analyzer" : "body"
更改为"analyzer" : "title"
,我会收到相同的错误,但使用title
: "error" : {
"root_cause" : [ {
"type" : "illegal_argument_exception",
"reason" : "Analyzer [title] doesn't exists"
如果我将
"analyzer" : "body"
更改为"analyzer" : "default"
,则在该行中不会显示错误,但在下一行中会显示错误。 "field" : "bigram",
"error" : {
"root_cause" : [ {
"type" : "illegal_argument_exception",
"reason" : "No mapping found for field [bigram]"
进行此工作的唯一方法是添加:
"analyzer" : "default",
和"field" : "title",
:curl -XPOST 'localhost:9200/books/_search?pretty=true' -d '{
"suggest" : {
"text" : "al quet",
"simple_phrase" : {
"phrase" : {
"analyzer" : "default",
"field" : "title",
"size" : 1,
"real_word_error_likelihood" : 0.95,
"max_errors" : 0.5,
"gram_size" : 2,
"direct_generator" : [ {
"field" : "title",
"suggest_mode" : "always",
"min_word_length" : 1
} ],
"highlight": {
"pre_tag": "<em>",
"post_tag": "</em>"
}
}
}
}
}'
有了这个我得到这个输出:
"suggest" : {
"simple_phrase" : [ {
"text" : "al quet",
"offset" : 0,
"length" : 7,
"options" : [ {
"text" : "al quiet",
"highlighted" : "al <em>quiet</em>",
"score" : 0.09049256
} ]
} ]
}
如您所见,它正在纠正
quiet
而不是al
,而我的所有其他尝试都一样,它只能纠正一个单词。如何在示例中输入
al quet
并返回all quiet
的情况下,做一个成功的短语建议器? 最佳答案
您遇到第一个错误,因为索引中没有名为body的analyzer
,也没有标题
第二个错误是由于缺少域bigram,索引中只有三个域,即标题,作者和年份。
在当前设置下,为了使suggester
正常工作,您需要为max_errors
赋予较高的值(value)。根据文档,max_errors是
所以这应该给您想要的输出。
{
"suggest": {
"text": "al quet",
"simple_phrase": {
"phrase": {
"analyzer": "default",
"field": "title",
"size": 1,
"real_word_error_likelihood": 0.95,
"max_errors": 0.9, <--- increase this value
"gram_size": 2,
"direct_generator": [
{
"field": "title",
"suggest_mode": "always",
"min_word_length": 1
}
],
"highlight": {
"pre_tag": "<em>",
"post_tag": "</em>"
}
}
}
},
"size": 0
}
您可能想要对短语使用
shingles
,而对collate
使用仅获取索引中的结果。我已经为this question提供了详细的答案,这可能会有所帮助。