问题描述
我尝试了一个独立的 Spark 应用程序(而不是 PredictionIO),我的观察结果是一样的.所以这不是 PredictionIO 问题,但仍然令人困惑.
I tried a standalone Spark application (instead of PredictionIO) and my observations are the same. So this is not a PredictionIO issue, but still confusing.
我正在使用 PredictionIO 0.9.6 和 Recommendation 模板协同过滤.我的数据集中的评分是 1 到 10 之间的数字.当我第一次使用模板中的默认值训练模型时(使用 ALS.train
),预测很糟糕,至少在主观上是这样.分数高达 60.0 左右,但推荐似乎完全是随机的.
I am using PredictionIO 0.9.6 and the Recommendation template for collaborative filtering. The ratings in my data set are numbers between 1 and 10. When I first trained a model with defaults from the template (using ALS.train
), the predictions were horrible, at least subjectively. Scores ranged up to 60.0 or so but the recommendations seemed totally random.
有人建议 ALS.trainImplicit
做得更好,所以我相应地更改了 src/main/scala/ALSAlgorithm.scala
:
Somebody suggested that ALS.trainImplicit
did a better job, so I changed src/main/scala/ALSAlgorithm.scala
accordingly:
val m = ALS.trainImplicit( // instead of ALS.train
ratings = mllibRatings,
rank = ap.rank,
iterations = ap.numIterations,
lambda = ap.lambda,
blocks = -1,
alpha = 1.0, // also added this line
seed = seed)
现在的分数要低得多(低于 1.0),但建议与个人评分一致.好多了,但也令人困惑.PredictionIO 通过这种方式定义了 explicit 和 implicit 之间的区别:
Scores are much lower now (below 1.0) but the recommendations are in line with the personal ratings. Much better, but also confusing. PredictionIO defines the difference between explicit and implicit this way:
显式偏好(也称为显式反馈"),例如用户给项目的评级".隐性偏好(也称为作为隐式反馈"),例如查看"和购买"历史记录.
和:
默认情况下,推荐模板使用 ALS.train()
,它期望用户对项目进行评分的明确评分值.
文档有误吗?我仍然认为显式反馈适合我的用例.也许我需要使用 ALS.train
调整模板以获得有用的建议?还是我误会了什么?
Is the documentation wrong? I still think that explicit feedback fits my use case. Maybe I need to adapt the template with ALS.train
in order to get useful recommendations? Or did I just misunderstand something?
推荐答案
很大程度上取决于您收集数据的方式.通常看似明确的评级实际上可能是隐含的.例如,假设您提供了允许用户对他们之前购买/使用过的项目进行评分的选项.这意味着他们花时间评估该特定项目这一事实意味着该项目具有高质量.因此,质量差的物品根本不会被评级,因为人们甚至懒得使用它们.因此,即使数据集是显式的,您也可能会得到更好的结果,因为如果您认为结果是隐式的.同样,这取决于数据的获取方式.
A lot of it depends on how you gathered the data. Often ratings that seem explicit can actually be implicit. For instance, assume you give the option of allowing users to rate items that they have purchased / used before. This means that the very fact that they have spent their time evaluating that particular item means that the item is of a high quality. As such, items of poor quality are not rated at all because people do not even bother to use them. As such, even though the dataset is intended to be explicit, you may get better results because if you consider the results to be implicit. Again, this varies significantly based on how the data is obtained.
这篇关于为什么 ALS.trainImplicit 可以为显式评分提供更好的预测?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!