trainImplicit可以对明确的收视率给出更好的预测

trainImplicit可以对明确的收视率给出更好的预测

本文介绍了为什么ALS.trainImplicit可以对明确的收视率给出更好的预测?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

编辑:我尝试了一个独立的Spark应用程序(而不是PredictionIO),并且观察到的结果是相同的.因此,这不是PredictionIO问题,但仍然令人困惑.

I tried a standalone Spark application (instead of PredictionIO) and my observations are the same. So this is not a PredictionIO issue, but still confusing.

我正在使用PredictionIO 0.9.6和推荐模板协同过滤.我的数据集中的评分是介于1到10之间的数字.当我首次使用模板中的默认值(使用 ALS.train )训练模型时,这些预测是可怕的,至少是主观的.分数高达60.0左右,但建议似乎完全是随机的.

I am using PredictionIO 0.9.6 and the Recommendation template for collaborative filtering. The ratings in my data set are numbers between 1 and 10. When I first trained a model with defaults from the template (using ALS.train), the predictions were horrible, at least subjectively. Scores ranged up to 60.0 or so but the recommendations seemed totally random.

有人建议 ALS.trainImplicit 做得更好,所以我相应地更改了 src/main/scala/ALSAlgorithm.scala :

Somebody suggested that ALS.trainImplicit did a better job, so I changed src/main/scala/ALSAlgorithm.scala accordingly:

val m = ALS.trainImplicit(  // instead of ALS.train
  ratings = mllibRatings,
  rank = ap.rank,
  iterations = ap.numIterations,
  lambda = ap.lambda,
  blocks = -1,
  alpha = 1.0,  // also added this line
  seed = seed)

现在的分数要低得多(低于1.0),但建议与个人评分保持一致.好多了,但也令人困惑.PredictionIO通过以下方式定义显式隐式之间的区别:

Scores are much lower now (below 1.0) but the recommendations are in line with the personal ratings. Much better, but also confusing. PredictionIO defines the difference between explicit and implicit this way:

和:

说明文件有误吗?我仍然认为明确的反馈适合我的用例.也许我需要使用 ALS.train 修改模板以获得有用的建议?还是我只是误解了什么?

Is the documentation wrong? I still think that explicit feedback fits my use case. Maybe I need to adapt the template with ALS.train in order to get useful recommendations? Or did I just misunderstand something?

推荐答案

很多取决于您如何收集数据.通常,看似明确的等级实际上可以是隐含的.例如,假设您提供允许用户对他们之前购买/使用过的商品进行评分的选项.这意味着他们花了很多时间来评估该特定项目这一事实意味着该项目是高质量的.因此,质量差的物品根本不会被评级,因为人们甚至不愿意使用它们.这样,即使数据集是显式的,也可能会得到更好的结果,因为如果您认为结果是隐式的.同样,这取决于获取数据的方式.

A lot of it depends on how you gathered the data. Often ratings that seem explicit can actually be implicit. For instance, assume you give the option of allowing users to rate items that they have purchased / used before. This means that the very fact that they have spent their time evaluating that particular item means that the item is of a high quality. As such, items of poor quality are not rated at all because people do not even bother to use them. As such, even though the dataset is intended to be explicit, you may get better results because if you consider the results to be implicit. Again, this varies significantly based on how the data is obtained.

这篇关于为什么ALS.trainImplicit可以对明确的收视率给出更好的预测?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-27 14:18