我在lucene中测试boost运算符,发现奇怪的行为




query1 = "red fox"
query2 = "red^1.2 fox"


当我针对文本测试查询时:


  “很棒的红狐狸”


我获得了query2的得分低于query1的得分。但是我希望query2应该赢。

以下查询说明

解释查询1

{0,4339554 = (MATCH) sum of:
  0,2169777 = (MATCH) weight(content:fox in 0), product of:
    0,7071068 = queryWeight(content:fox), product of:
      0,3068528 = idf(docFreq=1, maxDocs=1)
      2,304384 = queryNorm
    0,3068528 = (MATCH) fieldWeight(content:fox in 0), product of:
      1 = tf(termFreq(content:fox)=1)
      0,3068528 = idf(docFreq=1, maxDocs=1)
      1 = fieldNorm(field=content, doc=0)
  0,2169777 = (MATCH) weight(content:red in 0), product of:
    0,7071068 = queryWeight(content:red), product of:
      0,3068528 = idf(docFreq=1, maxDocs=1)
      2,304384 = queryNorm
    0,3068528 = (MATCH) fieldWeight(content:red in 0), product of:
      1 = tf(termFreq(content:red)=1)
      0,3068528 = idf(docFreq=1, maxDocs=1)
      1 = fieldNorm(field=content, doc=0)
}


解释查询2

{0,4313012 = (MATCH) sum of:
  0,2396118 = (MATCH) weight(content:fox^1.25 in 0), product of:
    0,7808688 = queryWeight(content:fox^1.25), product of:
      1,25 = boost
      0,3068528 = idf(docFreq=1, maxDocs=1)
      2,035813 = queryNorm
    0,3068528 = (MATCH) fieldWeight(content:fox in 0), product of:
      1 = tf(termFreq(content:fox)=1)
      0,3068528 = idf(docFreq=1, maxDocs=1)
      1 = fieldNorm(field=content, doc=0)
  0,1916894 = (MATCH) weight(content:red in 0), product of:
    0,6246951 = queryWeight(content:red), product of:
      0,3068528 = idf(docFreq=1, maxDocs=1)
      2,035813 = queryNorm
    0,3068528 = (MATCH) fieldWeight(content:red in 0), product of:
      1 = tf(termFreq(content:red)=1)
      0,3068528 = idf(docFreq=1, maxDocs=1)
      1 = fieldNorm(field=content, doc=0)
}


我不知道为什么提升查询的得分比普通查询低?

最佳答案

这是由于查询规范。计分算法的此功能试图使分数从一个查询到下一个查询大致可比。

计算公式为:


  queryNorm = 1 /sumOfSquaredWeights½


哪里:


  sumOfSquaredWeights =查询boost2·∑(idf·词条boost)2


如果从解释中删除该因素,只需将最终分数除以查询范数,您就会发现第二个查询确实确实获得了更高的分数:


query1-> .4339554 / 2.304384 = 0.1883
query2-> .4313012 / 2.035813 = 0.2119


不过,更大的一点是:在比较一个查询与下一个查询的分数时,您不应读太多。分数仅与生成分数的查询真正相关。您可以在解释中看到,提升词对得分的相对权重更大,这实际上是所有提升的目的。

关于c# - 为什么lucene增强查询的得分低于相同的普通查询?,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/40250210/

10-13 04:54
查看更多