本文介绍了替换Spark DataFrame中的空值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在这里看到了一个解决方案,但是当我尝试该解决方案时,对我来说不起作用.

I saw a solution here but when I tried it doesn't work for me.

首先,我导入一个cars.csv文件:

First I import a cars.csv file :

val df = sqlContext.read
              .format("com.databricks.spark.csv")
              .option("header", "true")
              .load("/usr/local/spark/cars.csv")

如下所示:

+----+-----+-----+--------------------+-----+
|year| make|model|             comment|blank|
+----+-----+-----+--------------------+-----+
|2012|Tesla|    S|          No comment|     |
|1997| Ford| E350|Go get one now th...|     |
|2015|Chevy| Volt|                null| null|

然后我这样做:

df.na.fill("e",Seq("blank"))

但是空值没有改变.

有人可以帮助我吗?

推荐答案

这基本上很简单.您需要创建一个新的DataFrame.我使用的是您先前定义的DataFrame df.

This is basically very simple. You'll need to create a new DataFrame. I'm using the DataFrame df that you have defined earlier.

val newDf = df.na.fill("e",Seq("blank"))

DataFrame s是不可变的结构.每次执行需要存储的转换时,都需要将转换后的DataFrame更改为新值.

DataFrames are immutable structures.Each time you perform a transformation which you need to store, you'll need to affect the transformed DataFrame to a new value.

这篇关于替换Spark DataFrame中的空值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-29 13:17