本文介绍了迭代Spark数据框中的行和列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有以下动态创建的Spark数据框:
I have the following Spark dataframe that is created dynamically:
val sf1 = StructField("name", StringType, nullable = true)
val sf2 = StructField("sector", StringType, nullable = true)
val sf3 = StructField("age", IntegerType, nullable = true)
val fields = List(sf1,sf2,sf3)
val schema = StructType(fields)
val row1 = Row("Andy","aaa",20)
val row2 = Row("Berta","bbb",30)
val row3 = Row("Joe","ccc",40)
val data = Seq(row1,row2,row3)
val df = spark.createDataFrame(spark.sparkContext.parallelize(data), schema)
df.createOrReplaceTempView("people")
val sqlDF = spark.sql("SELECT * FROM people")
现在,我需要遍历sqlDF
中的每一行和每一列以打印每一列,这是我的尝试:
Now, I need to iterate each row and column in sqlDF
to print each column, this is my attempt:
sqlDF.foreach { row =>
row.foreach { col => println(col) }
}
row
的类型为Row
,但不可迭代,这就是为什么此代码在row.foreach
中引发编译错误的原因.如何迭代Row
中的每一列?
row
is type Row
, but is not iterable that's why this code throws a compilation error in row.foreach
. How to iterate each column in Row
?
推荐答案
您可以使用toSeq
将Row
转换为Seq
.转到Seq
后,您可以像往常一样使用foreach
,map
或任何您需要的内容进行迭代
You can convert Row
to Seq
with toSeq
. Once turned to Seq
you can iterate over it as usual with foreach
, map
or whatever you need
sqlDF.foreach { row =>
row.toSeq.foreach{col => println(col) }
}
输出:
Berta
bbb
30
Joe
Andy
aaa
20
ccc
40
这篇关于迭代Spark数据框中的行和列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!