本文介绍了使用 ALS.recommendation 获得错误的建议的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我编写了一个用于提出建议的 spark 程序.然后我使用了 ALS.recommendation 库.我用以下名为 trainData 的数据集做了一个小测试:

I write a spark program for making recommendations. Then I used ALS.recommendation library. And I made a small test with the following dataset called trainData:

(u1, m1, 1)
(u1, m4, 1)
(u2, m2, 1)
(u2, m3, 1)
(u3, m1, 1)
(u3, m3, 1)
(u3, m4, 1)
(u4, m3, 1)
(u4, m4, 1)
(u5, m2, 1)
(u5, m4, 1)

第一列包含用户,第二列包含用户评分的项目,第三列包含评分.

The first column contains the user, the second contains the items rated by the users and the third contains the ratings.

在我用 Scala 编写的代码中,我使用以下方法训练模型:

In my code written in scala I trained the model using:

myModel = ALS.trainImplicit(trainData, 3, 5, 0.01, 1.0)

我尝试使用以下说明检索对 u1 的一些建议:

I try to retrieve some recommendations for u1 using this instruction:

recommendations = myModel.recommendProducts(idUser, 2)

其中 idUser 包含影响到用户的 ID u1作为建议,我获得:

where idUser contains the ID affected to the user u1As recommendations, I obtain:

(u1, m1, 1.0536233346170754)
(u1, m4, 0.8540954252858661)
(u1, m3, 0.09069877419040584)
(u1, m2, -0.1345521479521654)

如您所见,前两行显示推荐的项目是 u1 已经评分的项目(m1 和 m4).无论我选择哪个用户来获得推荐,我总是得到相同的行为(推荐的第一个项目是用户已经评分的项目).

As you can see, the first two lines show that the items recommended are the ones that u1 had already rated (m1 and m4).Whatever the user I select to obtain the recommendations, I always get the same behavior (the first items recommended are the ones the user already rated).

我觉得很奇怪!有什么问题吗?

I find it weird! Is there any problem anywhere?

推荐答案

我认为这是使用 recommendProducts 的预期行为,当您正在训练 ALS 等矩阵分解算法时找到将每个用户与每个项目相关联的评级.

I think that is the expected behaviour of using recommendProducts, when you are training a matrix factorization algorithm such as ALS you are attempting to find a rating that relates each user to each item.

ALS 根据用户已经评价过的项目来做这件事,所以当你为给定用户寻找推荐时,模型将最确定它已经看到的评价,所以大多数时候它会推荐产品额定.

ALS does this based on the items the user has already rated, so when you are finding recommendations for a given user the model will be most sure about the ratings it has already seen, so it will most of the times recommend products already rated.

您需要做的是保留每个用户评分的产品列表,并在提出建议时对其进行过滤.

What you need to do is to keep a list of products each user as rated and filter them when making the recommendations.

我深入研究了源代码和文档,以确保我在说什么.

I dug a bit into the source code and the documentations to be sure of what I was saying.

ALS.recommendProductsMatrixFactorizationModel(源代码).您可以在那里看到,提出推荐时的模型并不关心用户是否已经对该项目进行了评分.

ALS.recommendProducts is implemented in the class MatrixFactorizationModel (source code). You can see there that the model when making recommendations doesn't care if the user has already rated that item.

并且您应该注意,如果您使用的是隐式评分,那么您肯定希望推荐已经被用户隐式评分的产品:想象一下,您的隐含评分是在线商店中产品的页面浏览量,而您想要的是用户购买该产品.

And you should note that if you are using implicit ratings then you most definetly want to recommend products already implicitly rated by the user:Imagine the case where your implicit ratings are page views of your product in an online store and what you want is that the user buys the product.

我无权阅读那本书使用 Spark 进行高级分析,因此我无法评论那里的解释和示例.

I don't have access to that book Advanced analytics with Spark so I can't comment on the explations and examples there.

文档:

MatrixFactorizationModel

这篇关于使用 ALS.recommendation 获得错误的建议的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-27 14:18