本文介绍了Spark ALS推荐系统的值预测大于1的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在Spark(推荐系统算法)中使用ALS算法( implicitPrefs = True ).通常,运行此算法后,预测值必须为0到1.但是我收到的值大于1

I'm using ALS algorithm (implicitPrefs = True)in Spark (Recommendation system algorithm). Normally, after run this algorithm, value predict must be from 0 to 1. But i received value greater than 1

    "usn" : 72164,
    "recommendations" : [ 
        {
            "item_code" : "C1346",
            "rating" : 0.756096363067627
        }, 
        {
            "item_code" : "C0117",
            "rating" : 0.966064214706421
        }, 
        {
            "item_code" : "I0009",
            "rating" : 1.00000607967377
        }, 
        {
            "item_code" : "C0102",
            "rating" : 0.974934458732605
        }, 
        {
            "item_code" : "I0853",
            "rating" : 1.03272235393524
        }, 
        {
            "item_code" : "C0103",
            "rating" : 0.928574025630951
        }
    ]

我不明白为什么或它的评级值大于1(评级":1.00000607967377 评级":1.03272235393524 )

I don't understand why or what it is have rating value greater than 1 ("rating" : 1.00000607967377 and "rating" : 1.03272235393524)

一些类似的问题,但我仍然不明白: MLLib spark -ALStrainImplicit值大于1

Some question similar but i still don't understand: MLLib spark -ALStrainImplicit value more than 1

有人帮我解释异常值

推荐答案

不用担心! ALS没有任何问题.

Don't worry about that ! There is nothing wrong with ALS.

尽管如此,如您所见,ALS所返回的带有Apache Spark隐式反馈的预测分数并未进行归一化以适合[0,1]之间.有时您甚至可能会得到负值. (有关的更多信息这里.)

Nevertheless, the prediction scores returned by ALS with implicit feedbacks with Apache Spark aren't normalized to fit be between [0,1], like you saw. You might even get negative values sometimes. (more on that here.)

ALS使用随机梯度下降法和近似值来计算(并重新计算)每一步的用户和项目因数,以最小化允许其扩展的成本函数.

ALS uses stochastic gradient descent and approximations to compute (and re-compute) users and item factors on each step to minimize the cost function which allows it to scale.

事实上,归一化这些分数并不重要.这样做的原因实际上是那些分数本身并没有多大意义.

As a matter of fact, normalizing those scores isn't relevant. The reason for this is actually that those scores doesn't mean much on their own.

对于这些分数,您无法在每个示例中使用RMSE来评估建议的效果.如果您有兴趣评估这种类型的推荐者,建议您阅读

You can't use RMSE per example on those scores to evaluate the performance of your recommendations. If you are interested in evaluating this type of recommenders, I advice you to read my answer on How can I evaluate the implicit feedback ALS algorithm for recommendations in Apache Spark?

研究或行业中使用许多技术来处理此类结果.例如,您可以使用threshold对每个语音说出 binarize 预测.

There is many techniques used in research or/and the industry to deal with such types of results. e.g You can binarize predictions per say using a threshold.

这篇关于Spark ALS推荐系统的值预测大于1的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-25 01:37