明确指定用于读取JSON的架构，并将缺少的字段标记为null

本文介绍了明确指定用于读取JSON的架构，并将缺少的字段标记为null的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在像这样生成DataSet<Person>:

DataSet<Person> personDs = sparkSession.read().json("people.json").as(Encoders.bean(Person.class));

其中Person是

class Person {
    private String name;
    private String placeOfBirth;

    //Getters and setters
    ...
}

如果我的输入数据仅包含名称({"name" : "bob"})，则会出现错误org.apache.spark.sql.AnalysisException: cannot resolve 'placeOfBirth' given input columns: [name].

If my input data only contains a name ({"name" : "bob"}), I get an error org.apache.spark.sql.AnalysisException: cannot resolve 'placeOfBirth' given input columns: [name].

我有什么办法告诉Spark placeOfBirth(或任何其他字段)可以是null吗?

Is there any way for me to tell Spark that placeOfBirth (or any other field) can be null?

推荐答案

在Spark 2.3.0和Scala 2.11.12中，代码对我有效:

In Spark 2.3.0 and Scala 2.11.12 that code worked for me:

sparkSession.read.schema("name String, placeOfBirth String").json("people.json").as(Encoders.bean(classOf[Person])).show()

输出:

+----+------------+
|name|placeOfBirth|
+----+------------+
| bob|        null|
+----+------------+

这篇关于明确指定用于读取JSON的架构，并将缺少的字段标记为null的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！