问题描述
我注意到 SparkML 中有两个 LinearRegressionModel
类,一个在 ML 包 (spark.ml
) 中,另一个在 MLLib
(spark.mllib
) 包.
I noticed there are two LinearRegressionModel
classes in SparkML, one in ML package (spark.ml
) and another one in MLLib
(spark.mllib
) package.
这两者的实现方式完全不同 - 例如MLLib
中的一个实现了 Serializable
,而另一个没有.
These two are implemented quite differently - e.g. the one from MLLib
implements Serializable
, while the other one does not.
顺便说一下,RandomForestModel
或 Word2Vec
也是如此.
By the way, the same is true about RandomForestModel
or Word2Vec
.
为什么有两个类?哪个是正确的"?一?有没有办法将一个转换成另一个?
Why are there two classes? Which is the "right" one? And is there a way to convert one into another?
推荐答案
oasmllib
包含旧的基于 RDD 的 API 而 oasml
包含围绕 构建的新 API数据集
和 ML 管道.ml
和 mllib
在 2.0.0 中达到了功能对等,并且 mllib
正在慢慢被弃用(这已经发生在线性回归的情况下)并且很可能将在下一个主要版本中删除.
o.a.s.mllib
contains old RDD-based API while o.a.s.ml
contains new API build around Dataset
and ML Pipelines. ml
and mllib
reached feature parity in 2.0.0 and mllib
is slowly being deprecated (this already happened in case of linear regression) and most likely will be removed in the next major release.
因此,除非您的目标是向后兼容,否则正确的选择"是 o.a.s.ml
.
So unless your goal is backward compatibility then the "right choice" is o.a.s.ml
.
这篇关于Spark ML 和 MLLIB 包有什么区别的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!