迭代Spark数据框中的行和列

本文介绍了迭代Spark数据框中的行和列的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有以下动态创建的Spark数据框:

I have the following Spark dataframe that is created dynamically:

val sf1 = StructField("name", StringType, nullable = true)
val sf2 = StructField("sector", StringType, nullable = true)
val sf3 = StructField("age", IntegerType, nullable = true)

val fields = List(sf1,sf2,sf3)
val schema = StructType(fields)

val row1 = Row("Andy","aaa",20)
val row2 = Row("Berta","bbb",30)
val row3 = Row("Joe","ccc",40)

val data = Seq(row1,row2,row3)

val df = spark.createDataFrame(spark.sparkContext.parallelize(data), schema)

df.createOrReplaceTempView("people")
val sqlDF = spark.sql("SELECT * FROM people")

现在，我需要遍历sqlDF中的每一行和每一列以打印每一列，这是我的尝试:

Now, I need to iterate each row and column in sqlDF to print each column, this is my attempt:

sqlDF.foreach { row =>
  row.foreach { col => println(col) }
}

row的类型为Row，但不可迭代，这就是为什么此代码在row.foreach中引发编译错误的原因.如何迭代Row中的每一列?

row is type Row, but is not iterable that's why this code throws a compilation error in row.foreach. How to iterate each column in Row?

推荐答案

您可以使用toSeq将Row转换为Seq.转到Seq后，您可以像往常一样使用foreach，map或任何您需要的内容进行迭代

You can convert Row to Seq with toSeq. Once turned to Seq you can iterate over it as usual with foreach, map or whatever you need

    sqlDF.foreach { row =>
           row.toSeq.foreach{col => println(col) }
    }

输出:

Berta
bbb
30
Joe
Andy
aaa
20
ccc
40

这篇关于迭代Spark数据框中的行和列的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！