问题描述
我有两种方法对字符串列表进行不同的排名,我们可以将其视为列表的正确"排名(即黄金标准).
I have two methods that rank a list of strings differently, and what we can consider to be the "right" ranking of the list (i.e. a gold standard).
换句话说:
ranked_list_of_strings_1 = method_1(list_of_strings)
ranked_list_of_strings_2 = method_2(list_of_strings)
correctly_ranked_list_of_strings # Some permutation of list_of_strings
考虑到method_1
和method_2
是黑匣子,如何确定哪种方法更好?在SciPy
或scikit-learn
或类似的库中,是否有任何方法可以测量此值?
How can I determine which method is better considering that method_1
and method_2
are black boxes? Are there any methods to measure this available either in SciPy
or scikit-learn
or similar libraries?
在我的特定情况下,我实际上有一个数据框,每种方法都输出一个分数.重要的不是方法与真实分数之间的分数差异,而是方法获得了排名权利(分数越高意味着所有列的排名越高).
In my specific case, I actually have a dataframe, and each method outputs a score. What matters is not the difference in score between the methods and the true scores, but that the methods get the ranking right (higher score means higher ranking for all columns).
strings scores_method_1 scores_method_2 true_scores
5714 aeSeOg 0.54 0.1 0.8
5741 NQXACs 0.15 0.3 0.4
5768 zsFZQi 0.57 0.7 0.2
推荐答案
您正在寻找归一化的折扣累积收益( NDGC ).这是搜索引擎排名中常用的一项指标,用于测试结果排名的质量.
You're looking for Normalized Discounted Cumulative Gain (NDGC). It's a metric commonly used in search engine rankings to test the quality of the result ranking.
这个想法是,您可以通过点击(在您投放真实排名时)与用户反馈进行对比来测试您的排名(在您的情况下为两种方法). NDGC会告诉您相对于真实情况的排名质量.
The idea is that you test your ranking (in your case the two methods) against user feedback through clicks (in your cast the true rank). NDGC will tell you the quality of your ranking relative to the truth.
Python具有基于 RankEval 的模块,该模块可实现该指标(如果需要,还可以包含其他指标)想尝试一下). 仓库在这里,并且有一个很好的
Python has RankEval based module that implements this metric (and some others if you want to try them). The repo is here and there is a nice IPython NB with examples
这篇关于排名之间的距离的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!