本文介绍了为什么 spark-ml ALS 模型返回 NaN 和负数预测?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

实际上,我正在尝试将 spark-ml 中的 ALS 与隐式评级一起使用.

我注意到我的训练模型给出的一些预测是 negativeNaN,这是为什么?

解决方案

Apache Spark 提供了对 ALS 强制非负面约束的选项.

因此,要删除这些负值,您只需要设置:

Python:

非负=真

斯卡拉:

setNonnegative(true)

在创建您的 ALS 模型时,即:

>>>als = ALS(等级=10,maxIter=5,种子=0,非负=真)

非负矩阵分解(NMF或NNMF),也称为非负矩阵逼近,是多元分析和线性代数中的一组算法,其中矩阵V被分解为(通常)两个矩阵 WH,具有三个矩阵都具有非负元素的特性 [Ref.维基百科].

如果您想了解更多关于 NMF 的信息,我建议您阅读以下论文:

至于 NaN 值,通常是由于拆分您的数据集,如果其中一个不存在于训练集中并且仅存在于训练集中,则可能导致看不见的项目或用户测试集.如果您对培训进行交叉验证,也可能会发生这种情况.对于这个问题,有几个 JIRA 被标记为 resolved for 2.2 :

最新版本将允许您设置在创建模型时使用的冷启动策略.

Actually I'm trying to use ALS from spark-ml with implicit ratings.

I noticed that some predictions given by my trained model are negative or NaN, why is it?

解决方案

Apache Spark provides an option to force non negative constraints on ALS.

Thus, to remove these negative values, you'll just need to set :

Python:

nonnegative=True

Scala:

setNonnegative(true)

when creating your ALS model, i.e :

>>> als = ALS(rank=10, maxIter=5, seed=0, nonnegative=True)

Non-negative matrix factorization (NMF or NNMF), also called non-negative matrix approximation is a group of algorithms in multivariate analysis and linear algebra where a matrix V is factorized into (usually) two matrices W and H, with the property that all three matrices have nonnegative elements [Ref. Wikipedia].

If you want to read more about NMF , I'd recommend reading the following paper :

As for NaN values, usually it's due to splitting your dataset which can lead to unseen items or users if one of them isn't present in the training set and for the matter just present in the testing set. This might also happen if you cross validated your training. For the matter, there is a couple of JIRAs that are marked resolved for 2.2 :

The latest will allow you set the cold start strategy to use when creating your model.

这篇关于为什么 spark-ml ALS 模型返回 NaN 和负数预测?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-11 22:42