问题描述
我注意到SparkML中有两个LinearRegressionModel
类,一个在ML中,另一个在MLLib
包中.
I noticed there are two LinearRegressionModel
classes in SparkML, one in ML and another one in MLLib
package.
这两个实现方式大不相同-例如MLLib
中的一个实现Serializable
,而另一个不实现.
These two are implemented quite differently - e.g. the one from MLLib
implements Serializable
, while the other one does not.
顺便提一下,关于RandomForestModel
的说法是正确的.
By the way ame is true about RandomForestModel
.
为什么有两个班?哪个是正确"的?并有一种方法可以将它们转换为另一种吗?
Why is there two classes? Which is the "right" one? And is there a way to convert one into another?
推荐答案
o.a.s.mllib
包含旧的基于RDD的API,而o.a.s.ml
包含围绕Dataset
和ML Pipelines构建的新API. ml
和mllib
在2.0.0版本中达到了功能奇偶性,并且mllib
正在缓慢地被弃用(在线性回归的情况下已经发生),很可能在下一个主要版本中将其删除.
o.a.s.mllib
contains old RDD-based API while o.a.s.ml
contains new API build around Dataset
and ML Pipelines. ml
and mllib
reached feature parity in 2.0.0 and mllib
is slowly being deprecated (this already happened in case of linear regression) and most likely will be removed in the next major release.
因此,除非您的目标是向后兼容,否则正确的选择"是o.a.s.ml
.
So unless your goal is backward compatibility then the "right choice" is o.a.s.ml
.
这篇关于Spark ML和MLLIB软件包之间有什么区别的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!