我有MatrixFactorizationModel对象。如果在通过ALS.train(...)构建模型后尝试将产品推荐给单个用户,则需要300毫秒(用于我的数据和硬件)。但是,如果我将模型保存到磁盘上并加载回去,那么建议大约需要2000毫秒。同时Spark警告:
15/07/17 11:05:47 WARN MatrixFactorizationModel: User factor does not have a partitioner. Prediction on individual records could be slow.
15/07/17 11:05:47 WARN MatrixFactorizationModel: User factor is not cached. Prediction could be slow.
15/07/17 11:05:47 WARN MatrixFactorizationModel: Product factor does not have a partitioner. Prediction on individual records could be slow.
15/07/17 11:05:47 WARN MatrixFactorizationModel: Product factor is not cached. Prediction could be slow.
加载模型后,如何创建/设置分区程序并缓存用户和产品因素?以下方法无济于事:
model.userFeatures().cache();
model.productFeatures().cache();
我也试图对这些rdds进行重新分区,并从重新分区的版本创建新模型,但这也无济于事。
最佳答案
您不必使用括号,userFeatures是不接受参数的(Int,Array [Double])的RDD。
这将帮助您:
model.userFeatures.cache
model.productFeatures.cache
关于apache-spark - 正确保存/加载MatrixFactorizationModel,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/31479240/