本文介绍了将数据框添加到Spark中的列表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试基于一个的数据创建"n" dataframes.我正在检查dataframecolumn的Integer值,并循环sql语句以创建与列中的Integers一样多的"n" dataframes.

I'm trying to create "n" dataframes based on the data of one. I'm checking the Integer value of a column in dataframe and looping the sql sentence for creating "n" dataframes as many as Integers in the column.

这是我的代码:

val maxvalue = spark.sql("SELECT MAX(column4) as maxval FROM mydata").collect()(0).getInt(0)
for( i <- 0 to maxvalue){
         var query = "SELECT column1,colum2,colum3 FROM mydata WHERE column4 = "+ i
         val newdataframe = spark.sql(query)
         //add dataframe to List

}

我需要创建"n" dataframes,但是我不知道如何在循环之前声明List类型并填充到for中.

I need to create "n" dataframes but I don't know how to declare the List type before loop and populate inside the for.

现有的dataframe数据类型:

// +------------+------------+------------+------------+
// |     column1|     column2|     column3|     column4|
// +------------+------------+------------+------------+
// |      String|      Double|         Int|         Int|
// +------------+------------+------------+------------+

新的dataframes数据类型:

// +------------+------------+------------+
// |     column1|     column2|     column3|     
// +------------+------------+------------+
// |      String|      Double|         Int|
// +------------+------------+------------+

推荐答案

您可以创建一个可变列表并填充它:

You can create a mutable list and populate it:

val dfs = mutable.ArrayBuffer[DataFrame]()
for( i <- 0 to maxvalue){
  val query = "SELECT column1,colum2,colum3 FROM mydata WHERE column4 = "+ i
  val newdataframe = spark.sql(query)
  dfs += newdataframe
}

但是更好的方法(不使用可变数据结构)是将整数列表映射到DataFrames列表中

But a better approach (not using mutable data structure) is to map the list of integers into a list of DataFrames:

val dfs: Seq[DataFrame] = (0 to maxvalue).map { i => 
  spark.sql("SELECT column1,colum2,colum3 FROM mydata WHERE column4 = " + i)
}

这篇关于将数据框添加到Spark中的列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-25 11:59