这是我的索引的映射:
{
"itens" : {
"mappings" : {
"properties" : {
"card_id" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"name" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
}
当我运行此搜索时:
GET itens/_search
{
"query": {
"bool": {
"should": [
{
"match": {
"name": {
"query": "camisa",
"_name": "camisa"
}
}
},
{
"match": {
"name": {
"query": "flamengo",
"_name": "flamengo"
}
}
},
{
"match": {
"name": {
"query": "edição",
"_name": "edição"
}
}
},
{
"match": {
"name": {
"query": "torcedor",
"_name": "torcedor"
}
}
}
]
}
}
}
我得到以下结果:
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 8,
"relation" : "eq"
},
"max_score" : 3.2621913,
"hits" : [
{
"_index" : "itens",
"_type" : "_doc",
"_id" : "lDJ-5WwBSsI9bleNzslS",
"_score" : 3.2621913,
"_source" : {
"card_id" : "centauro",
"name" : "Bola Nike Edição Flamengo"
},
"matched_queries" : [
"edição",
"flamengo"
]
},
{
"_index" : "itens",
"_type" : "_doc",
"_id" : "lzKB5WwBSsI9bleNeMnt",
"_score" : 3.0658486,
"_source" : {
"card_id" : "centauro",
"name" : "Camisa do Flamengo Vermelha Edição 100 Anos"
},
"matched_queries" : [
"camisa",
"edição",
"flamengo"
]
},
{
"_index" : "itens",
"_type" : "_doc",
"_id" : "yV4q0WwB-vWXMqGoqMdJ",
"_score" : 2.7421699,
"_source" : {
"card_id" : "centauro",
"name" : "Camisa Flamengo 2019 Masculina Modelo Torcedor"
},
"matched_queries" : [
"camisa",
"torcedor",
"flamengo"
]
},
...and some others...
我的问题是:为什么第二个和第三个结果的排序要比第一个结果低(得分较低),我该如何解决?
第二个和第三个结果都具有3个匹配的查询,而第一个结果只有2个。这显然是不正确的相关性顺序,因为第二个和第三个结果与我的搜索的相关性比第一个要大。
我找到了this ElasticSearch doc about relevancies that looks wrong,并尝试使用
_search?search_type=dfs_query_then_fetch
进行搜索,但是得到的结果相同。编辑:
我为具有相同映射关系的测试创建了一个新索引,并插入了我谈论过的以下3个文档:
Bola Nike Edição Flamengo
,Camisa do Flamengo Vermelha Edição 100 Anos
和Camisa Flamengo 2019 Masculina Modelo Torcedor
。我运行了相同的查询,结果与预期的一样正确。因此,我认为也许只有在这些3之外还有其他文件时才会出现问题。因此,我将原始索引中的其他文件插入“bang!”,问题再次出现。
我只需要插入2个其他文件即可重复该问题:
Camisa Palmeiras 2019 Masculina Modelo Torcedor
和Camisa Internacional 2019 Masculina Modelo Torcedor
。我的搜索结果是这样的:
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 5,
"relation" : "eq"
},
"max_score" : 1.6201596,
"hits" : [
{
"_index" : "teste",
"_type" : "_doc",
"_id" : "nzKM8mwBSsI9bleNrsmM",
"_score" : 1.6201596,
"_source" : {
"card_id" : "some place",
"name" : "Bola Nike Edição Flamengo"
},
"matched_queries" : [
"edição",
"flamengo"
]
},
{
"_index" : "teste",
"_type" : "_doc",
"_id" : "gCaO8mwBepmixz6CaMCt",
"_score" : 1.5693209,
"_source" : {
"card_id" : "some place",
"name" : "Camisa do Flamengo Vermelha Edição 100 Anos"
},
"matched_queries" : [
"camisa",
"edição",
"flamengo"
]
},
{
"_index" : "teste",
"_type" : "_doc",
"_id" : "fyaN8mwBepmixz6CQcBc",
"_score" : 1.3466781,
"_source" : {
"card_id" : "some place",
"name" : "Camisa Flamengo 2019 Masculina Modelo Torcedor"
},
"matched_queries" : [
"camisa",
"torcedor",
"flamengo"
]
},
{
"_index" : "teste",
"_type" : "_doc",
"_id" : "gSaP8mwBepmixz6CbsDW",
"_score" : 0.8151792,
"_source" : {
"card_id" : "some place",
"name" : "Camisa Palmeiras 2019 Masculina Modelo Torcedor"
},
"matched_queries" : [
"camisa",
"torcedor"
]
},
{
"_index" : "teste",
"_type" : "_doc",
"_id" : "giaP8mwBepmixz6C4MCL",
"_score" : 0.8151792,
"_source" : {
"card_id" : "some place",
"name" : "Camisa Internacional 2019 Masculina Modelo Torcedor"
},
"matched_queries" : [
"camisa",
"torcedor"
]
}
]
}
}
我使用
?explain=true
运行搜索,结果太长了,无法在此处粘贴,但是我将在结果中粘贴前两个文档的说明:{
"_shard" : "[teste][0]",
"_node" : "xnRySBw_T7Kjsl4wAa_2yg",
"_index" : "teste",
"_type" : "_doc",
"_id" : "nzKM8mwBSsI9bleNrsmM",
"_score" : 1.6201596,
"_source" : {
"card_id" : "some place",
"name" : "Bola Nike Edição Flamengo"
},
"matched_queries" : [
"edição",
"flamengo"
],
"_explanation" : {
"value" : 1.6201596,
"description" : "sum of:",
"details" : [
{
"value" : 0.6173784,
"description" : "weight(name:flamengo in 0) [PerFieldSimilarity], result of:",
"details" : [
{
"value" : 0.6173784,
"description" : "score(freq=1.0), product of:",
"details" : [
{
"value" : 2.2,
"description" : "boost",
"details" : [ ]
},
{
"value" : 0.5389965,
"description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
"details" : [
{
"value" : 3,
"description" : "n, number of documents containing term",
"details" : [ ]
},
{
"value" : 5,
"description" : "N, total number of documents with field",
"details" : [ ]
}
]
},
{
"value" : 0.52064633,
"description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
"details" : [
{
"value" : 1.0,
"description" : "freq, occurrences of term within document",
"details" : [ ]
},
{
"value" : 1.2,
"description" : "k1, term saturation parameter",
"details" : [ ]
},
{
"value" : 0.75,
"description" : "b, length normalization parameter",
"details" : [ ]
},
{
"value" : 4.0,
"description" : "dl, length of field",
"details" : [ ]
},
{
"value" : 5.8,
"description" : "avgdl, average length of field",
"details" : [ ]
}
]
}
]
}
]
},
{
"value" : 1.0027812,
"description" : "weight(name:edição in 0) [PerFieldSimilarity], result of:",
"details" : [
{
"value" : 1.0027812,
"description" : "score(freq=1.0), product of:",
"details" : [
{
"value" : 2.2,
"description" : "boost",
"details" : [ ]
},
{
"value" : 0.87546873,
"description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
"details" : [
{
"value" : 2,
"description" : "n, number of documents containing term",
"details" : [ ]
},
{
"value" : 5,
"description" : "N, total number of documents with field",
"details" : [ ]
}
]
},
{
"value" : 0.52064633,
"description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
"details" : [
{
"value" : 1.0,
"description" : "freq, occurrences of term within document",
"details" : [ ]
},
{
"value" : 1.2,
"description" : "k1, term saturation parameter",
"details" : [ ]
},
{
"value" : 0.75,
"description" : "b, length normalization parameter",
"details" : [ ]
},
{
"value" : 4.0,
"description" : "dl, length of field",
"details" : [ ]
},
{
"value" : 5.8,
"description" : "avgdl, average length of field",
"details" : [ ]
}
]
}
]
}
]
}
]
}
},
{
"_shard" : "[teste][0]",
"_node" : "xnRySBw_T7Kjsl4wAa_2yg",
"_index" : "teste",
"_type" : "_doc",
"_id" : "gCaO8mwBepmixz6CaMCt",
"_score" : 1.5693209,
"_source" : {
"card_id" : "some place",
"name" : "Camisa do Flamengo Vermelha Edição 100 Anos"
},
"matched_queries" : [
"camisa",
"edição",
"flamengo"
],
"_explanation" : {
"value" : 1.5693209,
"description" : "sum of:",
"details" : [
{
"value" : 0.26523292,
"description" : "weight(name:camisa in 1) [PerFieldSimilarity], result of:",
"details" : [
{
"value" : 0.26523292,
"description" : "score(freq=1.0), product of:",
"details" : [
{
"value" : 2.2,
"description" : "boost",
"details" : [ ]
},
{
"value" : 0.2876821,
"description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
"details" : [
{
"value" : 4,
"description" : "n, number of documents containing term",
"details" : [ ]
},
{
"value" : 5,
"description" : "N, total number of documents with field",
"details" : [ ]
}
]
},
{
"value" : 0.41907516,
"description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
"details" : [
{
"value" : 1.0,
"description" : "freq, occurrences of term within document",
"details" : [ ]
},
{
"value" : 1.2,
"description" : "k1, term saturation parameter",
"details" : [ ]
},
{
"value" : 0.75,
"description" : "b, length normalization parameter",
"details" : [ ]
},
{
"value" : 7.0,
"description" : "dl, length of field",
"details" : [ ]
},
{
"value" : 5.8,
"description" : "avgdl, average length of field",
"details" : [ ]
}
]
}
]
}
]
},
{
"value" : 0.4969361,
"description" : "weight(name:flamengo in 1) [PerFieldSimilarity], result of:",
"details" : [
{
"value" : 0.4969361,
"description" : "score(freq=1.0), product of:",
"details" : [
{
"value" : 2.2,
"description" : "boost",
"details" : [ ]
},
{
"value" : 0.5389965,
"description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
"details" : [
{
"value" : 3,
"description" : "n, number of documents containing term",
"details" : [ ]
},
{
"value" : 5,
"description" : "N, total number of documents with field",
"details" : [ ]
}
]
},
{
"value" : 0.41907516,
"description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
"details" : [
{
"value" : 1.0,
"description" : "freq, occurrences of term within document",
"details" : [ ]
},
{
"value" : 1.2,
"description" : "k1, term saturation parameter",
"details" : [ ]
},
{
"value" : 0.75,
"description" : "b, length normalization parameter",
"details" : [ ]
},
{
"value" : 7.0,
"description" : "dl, length of field",
"details" : [ ]
},
{
"value" : 5.8,
"description" : "avgdl, average length of field",
"details" : [ ]
}
]
}
]
}
]
},
{
"value" : 0.80715185,
"description" : "weight(name:edição in 1) [PerFieldSimilarity], result of:",
"details" : [
{
"value" : 0.80715185,
"description" : "score(freq=1.0), product of:",
"details" : [
{
"value" : 2.2,
"description" : "boost",
"details" : [ ]
},
{
"value" : 0.87546873,
"description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
"details" : [
{
"value" : 2,
"description" : "n, number of documents containing term",
"details" : [ ]
},
{
"value" : 5,
"description" : "N, total number of documents with field",
"details" : [ ]
}
]
},
{
"value" : 0.41907516,
"description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
"details" : [
{
"value" : 1.0,
"description" : "freq, occurrences of term within document",
"details" : [ ]
},
{
"value" : 1.2,
"description" : "k1, term saturation parameter",
"details" : [ ]
},
{
"value" : 0.75,
"description" : "b, length normalization parameter",
"details" : [ ]
},
{
"value" : 7.0,
"description" : "dl, length of field",
"details" : [ ]
},
{
"value" : 5.8,
"description" : "avgdl, average length of field",
"details" : [ ]
}
]
}
]
}
]
}
]
}
}
我不知道要在这里找什么。我知道的是,第一个结果的得分应该比第二个更低。
最佳答案
Elasticsearch 7.0版将默认的主分片数量更改为1。因此,只要您没有明确指定其他数字,就不会再有此问题。在查询结果中,您可以看到默认值只有一个碎片:"_shards" : { "total" : 1
。
首先,让我们创建一个最小的可复制示例。
对应:
PUT itens
{
"mappings": {
"properties": {
"name": {
"type": "text"
}
}
}
}
示例文件:
PUT itens/_doc/1
{
"name": "Bola Nike Edição Flamengo"
}
PUT itens/_doc/2
{
"name": "Camisa do Flamengo Vermelha Edição 100 Anos"
}
PUT itens/_doc/3
{
"name": "Camisa Flamengo 2019 Masculina Modelo Torcedor"
}
我正在使用您上面提供的查询,并得到以下结果:
"hits" : [
{
"_index" : "itens",
"_type" : "_doc",
"_id" : "3",
"_score" : 1.5471338,
"_source" : {
"name" : "Camisa Flamengo 2019 Masculina Modelo Torcedor"
},
"matched_queries" : [
"camisa",
"torcedor",
"flamengo"
]
},
{
"_index" : "itens",
"_type" : "_doc",
"_id" : "2",
"_score" : 0.97927666,
"_source" : {
"name" : "Camisa do Flamengo Vermelha Edição 100 Anos"
},
"matched_queries" : [
"camisa",
"edição",
"flamengo"
]
},
{
"_index" : "itens",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.6860854,
"_source" : {
"name" : "Bola Nike Edição Flamengo"
},
"matched_queries" : [
"edição",
"flamengo"
]
}
]
因此,通过最少的示例,您将获得期望的结果。
要调试查询所发生的情况,请将
?explain=true
参数添加到查询中,以使整行看起来像GET itens/_search?explain=true
。这将增加很多输出,但是应该更好地解释那里发生的事情。请将该问题添加到您的原始问题中,如果结果不清楚,请添加评论,以便我们再看看。