我有表Foo(id, name, rateAvg, rateNum)rateAvg在1到5之间,而rateNum是用户的费率数量。

我用mysql查询表,并按最高评分的Foos进行排序,如下所示:

SELECT * FROM Fooo ORDER BY rateAVG DESC, rateNum DESC


但这还不够公平,例如,一行具有1,000个比率的rateAvg为4.8,另一行具有5和10个比率的rateAvg,而根据我的查询,第二项排在第一位。

编辑:
通过@kingkero的评论,我发现rateAvg和rateNum中的每一个都应具有一定的排序权重,如何将其应用于我的查询中

最佳答案

您可以尝试应用bayesian average,但是您应该预先计算该等级并将其存储在您的一个字段中。

b(r) = [ W(a) * a + W(r) * r ] / (W(a) + W(r)]

r = average rating for an item
W(r) = weight of that rating, which is the number of ratings
a = average rating for your collection
W(a) = weight of that average, which is an arbitrary number,
       but should be higher if you generally expect to have more ratings
       for your items; 100 is used here, for a database which expects
       many ratings per item
b(r) = new bayesian rating


例如,在您的情况下:

a = (4.8 * 1000 + 5 * 10) / 1010 = 4.8019
r1 = 4.8
r2 = 5
W(a) = 100 // arbitrary average weight
W(r1) = 1000 // weight of first raiting
W(r2) = 10 // weight of second rating

b(r1) = (100 * 4.8019 + 1000 * 4.8) / (100 + 1000) = 4.8001
b(r2) = (100 * 4.8019 + 10 * 5) / (100 + 10) = 4.8199


因此,您可以看到这些值彼此接近,并且可以尝试更改平均重量以针对您的情况设置此bayesian average rating

10-05 19:29