This question already has answers here:
SparkSQL: How to deal with null values in user defined function?

(3个答案)


4年前关闭。




我正在尝试在Spark数据帧的单个列中替换所有“:”->“_”实例。我正在尝试这样做:
val url_cleaner = (s:String) => {
   s.replaceAll(":","_")
}
val url_cleaner_udf = udf(url_cleaner)
val df = old_df.withColumn("newCol", url_cleaner_udf(old_df("oldCol")) )

但我不断收到错误:
 SparkException: Job aborted due to stage failure: Task 0 in stage 25.0 failed 4 times, most recent failure: Lost task 0.3 in stage 25.0 (TID 692, ip-10-81-194-29.ec2.internal): java.lang.NullPointerException

我在udf中哪里出问题了?

最佳答案

可能您在此列中有一些空值。

尝试:

val urlCleaner = (s:String) => {
   if (s == null) null else s.replaceAll(":","_")
}

您也可以使用regexp_replace(col("newCol"), ":", "_")代替自己的功能

关于scala - 在Spark数据帧中将所有 “:”替换为 “_” ,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/39308928/

10-09 00:36