问题描述
我想将多条经过训练的管道连接到一条,类似于" Spark将新的拟合阶段添加到退出PipelineModel而不再次拟合,但是以下解决方案适用于PySpark.
I would like to concatenate several trained Pipelines to one, which is similar to "Spark add new fitted stage to a exitsting PipelineModel without fitting again" however the solution as below is for PySpark.
> pipe_model_new = PipelineModel(stages = [pipe_model , pipe_model2])
> final_df = pipe_model_new.transform(df1)
在Apache Spark 2.0中,"PipelineModel"的构造函数被标记为私有,因此无法在外部调用.在"Pipeline"类中,只有"fit"方法会创建"PipelineModel"
In Apache Spark 2.0 "PipelineModel"'s constructor is marked as private, hence it can not be called outside. While in "Pipeline" class, only "fit" method creates "PipelineModel"
val pipelineModel = new PipelineModel("randomUID", trainedStages)
val df_final_full = pipelineModel.transform(df)
Error:(266, 26) constructor PipelineModel in class PipelineModel cannot be accessed in class Preprocessor
val pipelineModel = new PipelineModel("randomUID", trainedStages)
推荐答案
使用Pipeline
并调用fit
方法.如果阶段是Transfomer
,而PipelineModel
是**,则fit
的作用类似于标识.
There is nothing* wrong with using Pipeline
and invoking fit
method. If a stage is a Transfomer
, and PipelineModel
is**, fit
works like identity.
您可以检查相关的Python :
if isinstance(stage, Transformer):
transformers.append(stage)
dataset = stage.transform(dataset)
和 Scala代码:
这意味着拟合过程将仅验证架构并创建一个新的PipelineModel
对象.
This means that fitting process will only validate the schema and create a new PipelineModel
object.
case t: Transformer =>
t
*唯一可能的担心是是否存在非懒惰的Transformers
,但除已弃用的OneHotEncoder
之外,Spark核心API均未提供这种功能.
* The only possible concern is presence of non-lazy Transformers
, though, with exception to deprecated OneHotEncoder
, Spark core API doesn't provide such.
**在Python中:
** In Python:
from pyspark.ml import Transformer, PipelineModel
issubclass(PipelineModel, Transformer)
True
在Scala中
import scala.reflect.runtime.universe.typeOf
import org.apache.spark.ml._
typeOf[PipelineModel] <:< typeOf[Transformer]
Boolean = true
这篇关于将新的拟合阶段添加到退出的PipelineModel中,而无需再次拟合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!