问题描述
我有一个保存的PipelineModel:
I have a saved PipelineModel:
pipe_model = pipe.fit(df_train)
pipe_model.write().overwrite().save("/user/pipe_text_2")
现在我想在此Pipe中添加一个新的,已经适合的PipelineModel:
And now I want to add to this Pipe a new already fited PipelineModel:
pipe_model = PipelineModel.load("/user/pipe_text_2")
df2 = pipe_model.transform(df1)
kmeans = KMeans(k=20)
pipe2 = Pipeline(stages=[kmeans])
pipe_model2 = pipe2.fit(df2)
是否可以不重新安装?为了获得新的PipelineModel而不是新的Pipeline.理想的情况如下:
Is that possible without fitting it again? In order to obtain a new PipelineModel but not a new Pipeline. The ideal thing would be the following:
pipe_model_new = pipe_model + pipe_model2
TypeError: unsupported operand type(s) for +: 'PipelineModel' and 'PipelineModel'
我找到了将两个Spark mllib管道连接在一起但使用此解决方案,您需要重新安装整个管道.这就是我要避免的事情.
I've found Join two Spark mllib pipelines together but with this solution you need to fit the whole Pipe again. That is what I'm trying to avoid.
推荐答案
由于PipelineModel
对于PipelieModel
类是有效的stage
,因此您应该可以使用不需要fit
ing的再次:
Since PipelineModel
s are valid stage
s for a PipelieModel
class, you should be able to use this which does not require fit
ing again:
pipe_model_new = PipelineModel(stages = [pipe_model , pipe_model2])
final_df = pipe_model_new.transform(df1)
这篇关于Spark向退出的PipelineModel添加新的拟合阶段,而无需再次拟合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!