


I am a little confused by the lucene scoring strategy. I know that Lucene's scoring formula is like:

score(q,d) = coord(q,d) x queryNorm(q) X SUM <t_in_q> ( tf(t_in_d) x idf(t)^2 x t.getBoost() x norm(t,d))

我了解此公式中除 queryNorm(q)之外的所有组件.如官方文档所述,

I understand every component in this formula except queryNorm(q). As explained by the official documentation,

为什么我需要比较不同查询之间的分数?换句话说,您能否举一个示例来显示 queryNorm(q)在哪种上下文中有用?

Why do I need to compare scores between different queries? In another word, could you give an example to show in which context queryNorm(q) is useful?



Good question, I've wondered this myself. According to this ScoresAsPercentages argument, attempting to compare different queries or indexes scores, or even scores on the same query and index at different times, is a bad idea, and I agree.

我的理解是,尽管queryNorm确实没有使它们严格可比,但确实有帮助.与使用default queryNorm相比,它们更接近可比.

My understanding is that, while queryNorm really doesn't make them strictly comparable, it does help. They are closer to comparable with the Default queryNorm than without.


I suppose it could also enable people to write their own similarity, and use this call to create normalized, comparable scores, using algorithms that work in their particular case.

关于有关删除它的讨论 ,您可能会发现它很有趣.

There has been some discussion on dropping it, which you might find interesting.


06-30 10:34