本文介绍了使用带有Java(+ lombok)的不可变类的spark中的反序列化错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有这个简单的模型课

@Value // lombok - create standard all arg constructor and getters
public class ModelA implements Serializable {
    private String word;
    private double value;
}

此简单测试失败:

public class SparkSerializationTest {

    private SparkSession spark = SparkSession.builder()
            .master("local")
            .appName("Test")
            .getOrCreate();

    @Test
    public void testSerializationModelA() {
        ModelA   modelA1 = new ModelA("A1", 12.34);
        ModelA   modelA2 = new ModelA("A2", 56.78);

        Dataset<ModelA> dataset = spark.createDataset(
                Arrays.asList(modelA1, modelA2),
                Encoders.bean(ModelA.class));

        List<ModelA> yo = dataset.collectAsList(); // <== *** failure here ***

        assertThat(yo).isEqualTo(Arrays.asList(modelA1, modelA2));
    }
}

例外:

java.util.concurrent.ExecutionException: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 24, Column 67: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 24, Column 67: No applicable constructor/method found for zero actual parameters; candidates are: "com.xxx.yyy.ModelA(java.lang.String, double)"

似乎它需要一个零arg构造函数.但是我希望我的模型是不可变的,因此具有完整的arg构造函数且没有设置方法.我该怎么办?

It seems it requires a zero arg constructor. But I want my model to be immutable, thus with a full arg constructor and no setter. How should I do this?

推荐答案

简单的出路

只需给它提供无参数的no-args构造函数即可.这将是可变的,但比您提供所有二传手的方式要少一些混乱.当您使用 Kryo作为反序列化器(我认为您已经这样做)时,可以将此构造函数设为私有.

easy way out

Just give it no-args constructor without any setters. It will be mutable but in slightly less chaotic fashion than if you provided all the setters. When you use Kryo as your deserializer (which I think you do already) you can keep this constructor private.

所有参数构造函数仍然可以使用空值和荒谬的值来调用.如果要对对象的有效性强加某些约定,请显式使用验证.如果您追求的是不变性,那么使用no-arg构造函数的成员将不再是最终的.

All-args constructor still can be called with nulls and insensical values. If you want to impose some contract on validity of the object, use validation explicitly. If it is immutability you are after, your members will not be final anymore using no-arg constructor.


对象创建的动态性质

反序列化通过调用最简单的(no-args)构造函数来进行,因为对于使用的实用程序的作者来说,这是一个更容易实现的过程,而不是将对所有参数的调用都具有所有必要的属性,而对所有参数的构造函数进行汇编并且不能保证分配给对象属性.


dynamic nature of object creation

The deserialization works by calling the simplest (no-args) constructor as this is much easier process to implement for the authors of utility used, rather than assembling one call to the all-args constructor with all the necessary properties order of which could be arbitrary and the assignment to object properties not guaranteed.

相反,他们创建了vanila对象,并通过设置器或反射器将其填充,以确保名称在序列化版本和对象版本之间匹配.All-args构造函数执行此操作的可靠性较低,并且难以实现.

Instead they create the vanila object and populate it via setters or reflection making sure the names match between serialised and object versions. All-args constructor would do this less reliably and would be much harder to implement.

如果需要保持不变性,则必须使用自定义对象创建.请查看 Kryo用于创建自定义对象的示例:

If you need to keep your immutability, you have to use custom object creation. Please have a look at Kryo's example for custom object creation:

Registration registration = kryo.register(SomeClass.class);
registration.setInstantiator(new ObjectInstantiator<SomeClass>() {
  public SomeClass newInstance () {
    return new SomeClass("some constructor arguments", 1234);
  }
});

这篇关于使用带有Java(+ lombok)的不可变类的spark中的反序列化错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-01 23:44