本文介绍了Spark Dataframe列可为空的属性更改的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!



I want to change the nullable property of a particular column in a Spark Dataframe.


If I print schema of the dataframe currently it looks like below.

col1: string (nullable = false)
col2: string (nullable = true)
col3: string (nullable = false)
col4: float (nullable = true)


I just want col3 nullable property to be updated.

col1: string (nullable = false)
col2: string (nullable = true)
col3: string (nullable = true)
col4: float (nullable = true)


I checked online here are some links, but seems like they are doing it for all the columns but not to a specific column, seeChange nullable property of column in spark dataframe.Can any one please help me in this regard?



There is no "clear" way to do this. You can use trick like here


def setNullableStateOfColumn( df: DataFrame, cn: String, nullable: Boolean) : DataFrame = {

  // get schema
  val schema = df.schema
  // modify [[StructField] with name `cn`
  val newSchema = StructType(schema.map {
    case StructField( c, t, _, m) if c.equals(cn) => StructField( c, t, nullable = nullable, m)
    case y: StructField => y
  // apply new schema
  df.sqlContext.createDataFrame( df.rdd, newSchema )


It would copy DataFrame and copy schema, but with specyfying nullable programatically


def setNullableStateOfColumn(df: DataFrame, nullValues: Map[String, Boolean]) : DataFrame = {

  // get schema
  val schema = df.schema
  // modify [[StructField]s with name `cn`
  val newSchema = StructType(schema.map {
    case StructField( c, t, _, m) if nullValues.contains(c) => StructField( c, t, nullable = nullValues.get(c), m)
    case y: StructField => y
  // apply new schema
  df.sqlContext.createDataFrame( df.rdd, newSchema )

用法: setNullableStateOfColumn(df1,Map("col1"-> true,"col2"-> true,"col7"-> false));

Usage: setNullableStateOfColumn(df1, Map ("col1" -> true, "col2" -> true, "col7" -> false));

这篇关于Spark Dataframe列可为空的属性更改的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-29 13:17