问题描述
我正在Spark(推荐系统算法)中使用ALS算法( implicitPrefs = True ).通常,运行此算法后,预测值必须为0到1.但是我收到的值大于1
I'm using ALS algorithm (implicitPrefs = True)in Spark (Recommendation system algorithm). Normally, after run this algorithm, value predict must be from 0 to 1. But i received value greater than 1
"usn" : 72164,
"recommendations" : [
{
"item_code" : "C1346",
"rating" : 0.756096363067627
},
{
"item_code" : "C0117",
"rating" : 0.966064214706421
},
{
"item_code" : "I0009",
"rating" : 1.00000607967377
},
{
"item_code" : "C0102",
"rating" : 0.974934458732605
},
{
"item_code" : "I0853",
"rating" : 1.03272235393524
},
{
"item_code" : "C0103",
"rating" : 0.928574025630951
}
]
我不明白为什么或它的评级值大于1(评级":1.00000607967377 和评级":1.03272235393524 )
I don't understand why or what it is have rating value greater than 1 ("rating" : 1.00000607967377 and "rating" : 1.03272235393524)
一些类似的问题,但我仍然不明白: MLLib spark -ALStrainImplicit值大于1
Some question similar but i still don't understand: MLLib spark -ALStrainImplicit value more than 1
有人帮我解释异常值
推荐答案
不用担心! ALS
没有任何问题.
Don't worry about that ! There is nothing wrong with ALS
.
尽管如此,如您所见,ALS所返回的带有Apache Spark隐式反馈的预测分数并未进行归一化以适合[0,1]之间.有时您甚至可能会得到负值. (有关的更多信息这里.)
Nevertheless, the prediction scores returned by ALS with implicit feedbacks with Apache Spark aren't normalized to fit be between [0,1], like you saw. You might even get negative values sometimes. (more on that here.)
ALS
使用随机梯度下降法和近似值来计算(并重新计算)每一步的用户和项目因数,以最小化允许其扩展的成本函数.
ALS
uses stochastic gradient descent and approximations to compute (and re-compute) users and item factors on each step to minimize the cost function which allows it to scale.
事实上,归一化这些分数并不重要.这样做的原因实际上是那些分数本身并没有多大意义.
As a matter of fact, normalizing those scores isn't relevant. The reason for this is actually that those scores doesn't mean much on their own.
对于这些分数,您无法在每个示例中使用RMSE
来评估建议的效果.如果您有兴趣评估这种类型的推荐者,建议您阅读
You can't use RMSE
per example on those scores to evaluate the performance of your recommendations. If you are interested in evaluating this type of recommenders, I advice you to read my answer on How can I evaluate the implicit feedback ALS algorithm for recommendations in Apache Spark?
研究或行业中使用许多技术来处理此类结果.例如,您可以使用threshold
对每个语音说出 binarize 预测.
There is many techniques used in research or/and the industry to deal with such types of results. e.g You can binarize predictions per say using a threshold
.
这篇关于Spark ALS推荐系统的值预测大于1的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!