Scala将多个不同的列转换为Map列

Scala将多个不同的列转换为Map列

本文介绍了使用Spark Dataframe Scala将多个不同的列转换为Map列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据列,其列为:user, address1, address2, address3, phone1, phone2,依此类推.我想将此数据帧转换为-user, address, phone where address = Map("address1" -> address1.value, "address2" -> address2.value, "address3" -> address3.value)

I have a data frame with column: user, address1, address2, address3, phone1, phone2 and so on.I want to convert this data frame to - user, address, phone where address = Map("address1" -> address1.value, "address2" -> address2.value, "address3" -> address3.value)

我能够使用以下方法将列转换为地图:

I was able to convert the columns to map using:

val mapData = List("address1", "address2", "address3")
df.map(_.getValuesMap[Any](mapData))

但是我不确定如何将其添加到我的df中.

but I am not sure how to add this to my df.

我是Spark和Scala的新手,真的可以在这里使用一些帮助.

I am new to spark and scala and could really use some help here.

推荐答案

火花> = 2.0

您可以跳过udf并使用map(在Python中为create_map)SQL函数:

You can skip udf and use map (create_map in Python) SQL function:

import org.apache.spark.sql.functions.map

df.select(
  map(mapData.map(c => lit(c) :: col(c) :: Nil).flatten: _*).alias("a_map")
)

火花< 2.0

据我所知,没有直接的方法可以做到.您可以使用这样的UDF:

As far as I know there is no direct way to do it. You can use an UDF like this:

import org.apache.spark.sql.functions.{udf, array, lit, col}

val df = sc.parallelize(Seq(
  (1L, "addr1", "addr2", "addr3")
)).toDF("user", "address1", "address2", "address3")

val asMap = udf((keys: Seq[String], values: Seq[String]) =>
  keys.zip(values).filter{
    case (k, null) => false
    case _ => true
  }.toMap)

val keys = array(mapData.map(lit): _*)
val values = array(mapData.map(col): _*)

val dfWithMap = df.withColumn("address", asMap(keys, values))

另一个不需要UDF的选项是构造字段而不是映射:

Another option, which doesn't require UDFs, is to struct field instead of map:

val dfWithStruct = df.withColumn("address", struct(mapData.map(col): _*))

最大的优点是它可以轻松处理不同类型的值.

The biggest advantage is that it can easily handle values of different types.

这篇关于使用Spark Dataframe Scala将多个不同的列转换为Map列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-01 04:58