本文介绍了无法在 mleap 中序列化逻辑回归的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

java.lang.AssertionError:断言失败:此操作仅支持二元逻辑回归

我正在尝试在 mleap 中序列化一个 Spark 管道.

I am trying to serialize a spark pipeline in mleap.

我在我的管道中使用 Tokenizer、HashingTF 和 LogisticRegression.

I am using Tokenizer, HashingTF and LogisticRegression in my pipeline.

当我尝试序列化我的管道时,出现上述错误.这是我用来序列化管道的代码 -

When I am trying to serialize my pipeline I get the above error. Here is the code I am using to serialize the pipeline -

    val pipeline = Pipeline(pipelineConfig)

    val model = pipeline.fit(data)

    (for(bf <- managed(BundleFile("jar:file:/tmp/abc.model.twitter.zip"))) yield {
        model.writeBundle.format(SerializationFormat.Json).save(bf).get
    }).tried.get

    sc.stop()

根据文档,mleap 支持 LR.所以我完全不知道我在这里可能做错了什么.

As per the documentation, LR is supported by mleap. So I am totally clueless about what I might be doing wrong here.

推荐答案

yashdosi,

MLeap 默认支持 Spark 2.0(抱歉这没有很好的文档记录).在 2.0 中,仅支持二元逻辑回归.随着 2.1 的引入,出现了多项逻辑回归.由于 MLeap 旨在支持 2.0.0 及更高版本,因此我们内置了一种机制来选择您使用的 Spark 版本(目前 MLeap 支持 2.0 和 2.1,但默认为 2.0).

MLeap defaults to support for Spark 2.0 (sorry this isn't well documented). In 2.0, only binary logistic regression was supported. With the introduction of 2.1 there is multinomial logistic regression. Because MLeap is meant to support 2.0.0 and up, we have built in a mechanism for selecting which version of Spark you are using (currently MLeap supports 2.0 and 2.1, but defaults to 2.0).

尝试将此行添加到资源目录中的 application.conf 文件中,它会让 MLeap 知道在序列化时使用 Spark 2.1 转换器:

Try adding this line to your application.conf file in your resources directory, it will let MLeap know to use the Spark 2.1 transformers when serializing:

// application.conf in src/main/resources
ml.combust.mleap.spark.registry.default = ${ml.combust.mleap.spark.registry.v21}

这篇关于无法在 mleap 中序列化逻辑回归的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-23 02:13