本文介绍了迭代 Spark 数据帧中的行和列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有以下动态创建的 Spark 数据框:
I have the following Spark dataframe that is created dynamically:
val sf1 = StructField("name", StringType, nullable = true)
val sf2 = StructField("sector", StringType, nullable = true)
val sf3 = StructField("age", IntegerType, nullable = true)
val fields = List(sf1,sf2,sf3)
val schema = StructType(fields)
val row1 = Row("Andy","aaa",20)
val row2 = Row("Berta","bbb",30)
val row3 = Row("Joe","ccc",40)
val data = Seq(row1,row2,row3)
val df = spark.createDataFrame(spark.sparkContext.parallelize(data), schema)
df.createOrReplaceTempView("people")
val sqlDF = spark.sql("SELECT * FROM people")
现在,我需要迭代sqlDF
中的每一行和每一列来打印每一列,这是我的尝试:
Now, I need to iterate each row and column in sqlDF
to print each column, this is my attempt:
sqlDF.foreach { row =>
row.foreach { col => println(col) }
}
row
是 Row
类型,但不可迭代,这就是为什么这段代码会在 row.foreach
中引发编译错误.如何迭代Row
中的每一列?
row
is type Row
, but is not iterable that's why this code throws a compilation error in row.foreach
. How to iterate each column in Row
?
推荐答案
您可以使用 toSeq
将 Row
转换为 Seq
.一旦转向 Seq
,你可以像往常一样使用 foreach
、map
或任何你需要的东西
You can convert Row
to Seq
with toSeq
. Once turned to Seq
you can iterate over it as usual with foreach
, map
or whatever you need
sqlDF.foreach { row =>
row.toSeq.foreach{col => println(col) }
}
输出:
Berta
bbb
30
Joe
Andy
aaa
20
ccc
40
这篇关于迭代 Spark 数据帧中的行和列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!