我刚起步,我有数据框df:

+----------+------------+-----------+
| Column1  | Column2    | Sub       |
+----------+------------+-----------+
| 1        | 2          | 1         |
+----------+------------+-----------+
| 4        | null       | null      |
+----------+------------+-----------+
| 5        | null       | null      |
+----------+------------+-----------+
| 6        | 8          | 2         |
+----------+------------+-----------+

当减去两列时,一列为null,因此结果列也为null。
df.withColumn("Sub", col(A)-col(B))

预期输出应为:
+----------+------------+-----------+
|  Column1 | Column2    | Sub       |
+----------+------------+-----------+
| 1        | 2          | 1         |
+----------+------------+-----------+
| 4        | null       | 4         |
+----------+------------+-----------+
| 5        | null       | 5         |
+----------+------------+-----------+
| 6        | 8          | 2         |
+----------+------------+-----------+

我不想将column2替换为0,它应该仅为null。
有人可以帮我吗?

最佳答案

您可以使用when函数作为

import org.apache.spark.sql.functions._
df.withColumn("Sub", when(col("Column1").isNull, lit(0)).otherwise(col("Column1")) - when(col("Column2").isNull, lit(0)).otherwise(col("Column2")))

你应该有最终结果
+-------+-------+----+
|Column1|Column2| Sub|
+-------+-------+----+
|      1|      2|-1.0|
|      4|   null| 4.0|
|      5|   null| 5.0|
|      6|      8|-2.0|
+-------+-------+----+

10-01 00:50
查看更多