本文介绍了Spark ML和MLLIB软件包之间有什么区别的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我注意到SparkML中有两个LinearRegressionModel类,一个在ML中,另一个在MLLib包中.

I noticed there are two LinearRegressionModel classes in SparkML, one in ML and another one in MLLib package.

这两个实现方式大不相同-例如MLLib中的一个实现Serializable,而另一个不实现.

These two are implemented quite differently - e.g. the one from MLLib implements Serializable, while the other one does not.

顺便提一下,关于RandomForestModel的说法是正确的.

By the way ame is true about RandomForestModel.

为什么有两个班?哪个是正确"的?并有一种方法可以将它们转换为另一种吗?

Why is there two classes? Which is the "right" one? And is there a way to convert one into another?

推荐答案

o.a.s.mllib包含旧的基于RDD的API,而o.a.s.ml包含围绕Dataset和ML Pipelines构建的新API. mlmllib在2.0.0版本中达到了功能奇偶性,并且mllib正在缓慢地被弃用(在线性回归的情况下已经发生),很可能在下一个主要版本中将其删除.

o.a.s.mllib contains old RDD-based API while o.a.s.ml contains new API build around Dataset and ML Pipelines. ml and mllib reached feature parity in 2.0.0 and mllib is slowly being deprecated (this already happened in case of linear regression) and most likely will be removed in the next major release.

因此,除非您的目标是向后兼容,否则正确的选择"是o.a.s.ml.

So unless your goal is backward compatibility then the "right choice" is o.a.s.ml.

这篇关于Spark ML和MLLIB软件包之间有什么区别的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

06-27 05:48