我有表Foo(id, name, rateAvg, rateNum)
。 rateAvg
在1到5之间,而rateNum
是用户的费率数量。
我用mysql查询表,并按最高评分的Foos进行排序,如下所示:
SELECT * FROM Fooo ORDER BY rateAVG DESC, rateNum DESC
但这还不够公平,例如,一行具有1,000个比率的rateAvg为4.8,另一行具有5和10个比率的rateAvg,而根据我的查询,第二项排在第一位。
编辑:
通过@kingkero的评论,我发现rateAvg和rateNum中的每一个都应具有一定的排序权重,如何将其应用于我的查询中
最佳答案
您可以尝试应用bayesian average
,但是您应该预先计算该等级并将其存储在您的一个字段中。
b(r) = [ W(a) * a + W(r) * r ] / (W(a) + W(r)]
r = average rating for an item
W(r) = weight of that rating, which is the number of ratings
a = average rating for your collection
W(a) = weight of that average, which is an arbitrary number,
but should be higher if you generally expect to have more ratings
for your items; 100 is used here, for a database which expects
many ratings per item
b(r) = new bayesian rating
例如,在您的情况下:
a = (4.8 * 1000 + 5 * 10) / 1010 = 4.8019
r1 = 4.8
r2 = 5
W(a) = 100 // arbitrary average weight
W(r1) = 1000 // weight of first raiting
W(r2) = 10 // weight of second rating
b(r1) = (100 * 4.8019 + 1000 * 4.8) / (100 + 1000) = 4.8001
b(r2) = (100 * 4.8019 + 10 * 5) / (100 + 10) = 4.8199
因此,您可以看到这些值彼此接近,并且可以尝试更改平均重量以针对您的情况设置此
bayesian average rating
。