将新的拟合阶段添加到退出的PipelineModel中，而无需再次拟合

本文介绍了将新的拟合阶段添加到退出的PipelineModel中，而无需再次拟合的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想将多条经过训练的管道连接到一条，类似于" Spark将新的拟合阶段添加到退出PipelineModel而不再次拟合，但是以下解决方案适用于PySpark.

I would like to concatenate several trained Pipelines to one, which is similar to "Spark add new fitted stage to a exitsting PipelineModel without fitting again" however the solution as below is for PySpark.

> pipe_model_new = PipelineModel(stages = [pipe_model , pipe_model2])
> final_df = pipe_model_new.transform(df1)

在Apache Spark 2.0中，"PipelineModel"的构造函数被标记为私有，因此无法在外部调用.在"Pipeline"类中，只有"fit"方法会创建"PipelineModel"

In Apache Spark 2.0 "PipelineModel"'s constructor is marked as private, hence it can not be called outside. While in "Pipeline" class, only "fit" method creates "PipelineModel"

val pipelineModel =  new PipelineModel("randomUID", trainedStages)
val df_final_full = pipelineModel.transform(df)

Error:(266, 26) constructor PipelineModel in class PipelineModel cannot be accessed in class Preprocessor
    val pipelineModel =  new PipelineModel("randomUID", trainedStages)

推荐答案

使用Pipeline 并调用fit方法.如果阶段是Transfomer，而PipelineModel是**，则fit的作用类似于标识.

There is nothing* wrong with using Pipeline and invoking fit method. If a stage is a Transfomer, and PipelineModel is**, fit works like identity.

您可以检查相关的Python :

if isinstance(stage, Transformer):
    transformers.append(stage)
    dataset = stage.transform(dataset)

和 Scala代码:

这意味着拟合过程将仅验证架构并创建一个新的PipelineModel对象.

This means that fitting process will only validate the schema and create a new PipelineModel object.

case t: Transformer =>
  t

*唯一可能的担心是是否存在非懒惰的Transformers，但除已弃用的OneHotEncoder之外，Spark核心API均未提供这种功能.

* The only possible concern is presence of non-lazy Transformers, though, with exception to deprecated OneHotEncoder, Spark core API doesn't provide such.

**在Python中:

** In Python:

from pyspark.ml import Transformer, PipelineModel

issubclass(PipelineModel, Transformer)

True

在Scala中

import scala.reflect.runtime.universe.typeOf
import org.apache.spark.ml._

typeOf[PipelineModel] <:< typeOf[Transformer]

Boolean = true

这篇关于将新的拟合阶段添加到退出的PipelineModel中，而无需再次拟合的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！