问题描述
我对Lucene评分策略感到有些困惑.我知道Lucene的得分公式如下:
I am a little confused by the lucene scoring strategy. I know that Lucene's scoring formula is like:
score(q,d) = coord(q,d) x queryNorm(q) X SUM <t_in_q> ( tf(t_in_d) x idf(t)^2 x t.getBoost() x norm(t,d))
我了解此公式中除 queryNorm(q)之外的所有组件.如官方文档所述,
I understand every component in this formula except queryNorm(q). As explained by the official documentation,
为什么我需要比较不同查询之间的分数?换句话说,您能否举一个示例来显示 queryNorm(q)在哪种上下文中有用?
Why do I need to compare scores between different queries? In another word, could you give an example to show in which context queryNorm(q) is useful?
推荐答案
好问题,我自己也对此感到奇怪.根据此ScoresAsPercentages参数,尝试比较不同的查询或索引得分,甚至是在不同时间使用相同的查询和索引是一个坏主意,我同意.
Good question, I've wondered this myself. According to this ScoresAsPercentages argument, attempting to compare different queries or indexes scores, or even scores on the same query and index at different times, is a bad idea, and I agree.
我的理解是,尽管queryNorm
确实没有使它们严格可比,但确实有帮助.与使用default queryNorm相比,它们更接近可比.
My understanding is that, while queryNorm
really doesn't make them strictly comparable, it does help. They are closer to comparable with the Default queryNorm than without.
我想它还可以使人们写出自己的相似性,并使用适合特定情况的算法,使用此调用来创建归一化的可比分数.
I suppose it could also enable people to write their own similarity, and use this call to create normalized, comparable scores, using algorithms that work in their particular case.
关于有关删除它的讨论 ,您可能会发现它很有趣.
There has been some discussion on dropping it, which you might find interesting.
这篇关于Lucene评分:在什么情况下使用queryNorm?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!