使用Scala/Spark提取Teradata表后出现NullPointerException

本文介绍了使用Scala/Spark提取Teradata表后出现NullPointerException的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我需要从Teradata(只读访问)中提取一个表，以使用Scala(2.11)/Spark(2.1.0)进行镶木地板.我正在建立一个可以成功加载的数据框

I need to extract a table from Teradata (read-only access) to parquet with Scala (2.11) / Spark (2.1.0).I'm building a dataframe that I can load successfully

val df = spark.read.format("jdbc").options(options).load()

但是df.show给了我NullPointerException:

But df.show gives me a NullPointerException:

java.lang.NullPointerException
at org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter.write(UnsafeRowWriter.java:210)

我做了一个df.printSchema，我发现此NPE的原因是数据集包含(nullable = false)列的null值(看起来Teradata给了我错误的信息).确实，如果删除有问题的列，我可以实现df.show.

I did a df.printSchema and I found out that the reason for this NPE is that the dataset contains null values for (nullable = false) columns (it looks like Teradata is giving me wrong information). Indeed, I can achieve a df.show if I drop the problematic columns.

因此，我尝试指定所有列均设置为(nullable = true)的新架构:

So, I tried specifying a new schema with all columns set to (nullable = true):

val new_schema = StructType(df.schema.map {
  case StructField(n,d,nu,m) => StructField(n,d,true,m)
})

val new_df = spark.read.format("jdbc").schema(new_schema).options(options).load()

但是后来我得到了

org.apache.spark.sql.AnalysisException: JDBC does not allow user-specified schemas.;

我还尝试从上一个创建新的数据框，并指定所需的模式:

I also tried to create a new Dataframe from the previous one, specifying the wanted schema:

val new_df = df.sqlContext.createDataFrame(df.rdd, new_schema)

但是在对数据帧执行操作时，我仍然得到了NPE.

But I still got an NPE when taking action on the dataframe.

关于如何解决此问题的任何想法?

Any idea on how I could fix this?

推荐答案

我认为这可以在Teradata最新版本的jar中解决，经过所有研究，我更新了我的teradata jar (terajdbc4.jar和tdgssconfig.jar)版本更改为16.20.00.04，并将Teradata网址更改为

I think this is resolved in teradata latest version jars, After all the research I updated my teradata jars (terajdbc4.jar and tdgssconfig.jar) version to 16.20.00.04 and changed the teradata url to

teradata.connection.url=jdbc:teradata://hostname.some.com/
TMODE=ANSI,CHARSET=UTF8,TYPE=FASTEXPORT,COLUMN_NAME=ON,MAYBENULL=ON

在我添加了teradta url属性 COLUMN_NAME = ON，MAYBENULL = ON

this is worked after I added teradta url properties COLUMN_NAME=ON,MAYBENULL=ON

现在一切正常.

您可以在此处查看参考文件

you can check the reference document here

https://developer.teradata.com/doc/connectivity/jdbc/reference/current/jdbcug_chapter_2.html#2403_2403ch022113

这篇关于使用Scala/Spark提取Teradata表后出现NullPointerException的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！