您实际上可以使用经过训练的模型(无需更新)获得对新用户的预测:为了获得模型中用户的预测,您使用其潜在表示(大小为 f 的向量 u(因子数)),乘以产品潜在因子矩阵(由所有产品的潜在表示组成的矩阵),一堆大小为 f) 的向量,并为您提供每个产品的分数.对于新用户,问题在于您无法访问他们的潜在表示(您只有大小 M(不同产品的数量)的完整表示,但您可以做的是使用相似度函数来计算相似的潜在通过乘以乘积矩阵的转置来表示这个新用户.即如果您的用户潜在矩阵是 u 并且您的产品潜在矩阵是 v,对于模型中的用户 i,您可以通过执行以下操作获得分数: u_i * v对于新用户,您没有潜在表示,因此采用完整表示 full_u 并执行: full_u * v^t * v这将近似于新用户的潜在因素,并应该给出合理的推荐(如果模型已经为现有用户给出了合理的推荐)为了回答训练问题,这允许您为新用户计算预测,而无需对模型进行繁重的计算,而您现在只能偶尔进行一次.因此,您可以在晚上进行批处理,而在白天仍然可以对新用户进行预测.注意:MLLIB 允许您访问矩阵 u 和 vI build a simple recommendation system for the MovieLens DB inspired by https://databricks-training.s3.amazonaws.com/movie-recommendation-with-mllib.html.I also have problems with explicit training like here: Apache Spark ALS collaborative filtering results. They don't make senseUsing implicit training (on both explicit and implicit data) gives me reasonable results, but explicit training doesn't.While this is ok for me by now, im curious on how to update a model. While my current solution works likehaving all user ratingsgenerate modelget recommendations for userI want to have a flow like this:having a base of ratingsgenerate model once (optional save & load it)get some ratings by one user on 10 random movies (not in the model!)get recommendations using the model and the new user ratingsTherefore I must update my model, without completely recompute it. Is there any chance to do so?While the first way is good for batch processing (like generating recommendations in nightly batches) the second way would be good for nearly-live generating of recommendations. 解决方案 Edit: the following worked for me because I had implicit feedback ratings and was only interesting in ranking the products for a new user.More details hereYou can actually get predictions for new users using the trained model (without updating it):To get predictions for a user in the model, you use its latent representation (vector u of size f (number of factors)), which is multiplied by the product latent factor matrix (matrix made of the latent representations of all products, a bunch of vectors of size f) and gives you a score for each product. For new users, the problem is that you don't have access to their latent representation (you only have the full representation of size M (number of different products), but what you can do is use a similarity function to compute a similar latent representation for this new user by multiplying it by the transpose of the product matrix.i.e. if you user latent matrix is u and your product latent matrix is v, for user i in the model, you get scores by doing: u_i * vfor a new user, you don't have a latent representation, so take the full representation full_u and do: full_u * v^t * vThis will approximate the latent factors for the new users and should give reasonable recommendations (if the model already gives reasonable recommendations for existing users)To answer the question of training, this allows you to compute predictions for new users without having to do the heavy computation of the model which you can now do only once in a while. So you have you batch processing at night and can still make prediction for new user during the day.Note: MLLIB gives you access to the matrix u and v 这篇关于如何为 ALS 更新 Spark MatrixFactorizationModel的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持! 上岸,阿里云!
08-27 14:18
查看更多