本文介绍了从单个字符串创建 Spark DataFrame的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正在尝试采用硬编码的字符串并将其转换为 1 行 Spark DataFrame(具有 StringType
类型的单列),这样:
I'm trying to take a hardcoded String and turn it into a 1-row Spark DataFrame (with a single column of type StringType
) such that:
String fizz = "buzz"
会产生一个数据帧,它的 .show()
方法看起来像:
Would result with a DataFrame whose .show()
method looks like:
+-----+
| fizz|
+-----+
| buzz|
+-----+
迄今为止我最好的尝试是:
My best attempt thus far has been:
val rawData = List("fizz")
val df = sqlContext.sparkContext.parallelize(Seq(rawData)).toDF()
df.show()
但我收到以下编译器错误:
But I get the following compiler error:
java.lang.ClassCastException: org.apache.spark.sql.types.ArrayType cannot be cast to org.apache.spark.sql.types.StructType
at org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:413)
at org.apache.spark.sql.SQLImplicits.rddToDataFrameHolder(SQLImplicits.scala:155)
关于我要去哪里的任何想法?另外,如何将 "buzz"
设置为 fizz
列的行值?
Any ideas as to where I'm going awry? Also, how do I set "buzz"
as the row value for the fizz
column?
尝试:
sqlContext.sparkContext.parallelize(rawData).toDF()
我得到的 DF 如下所示:
I get a DF that looks like:
+----+
| _1|
+----+
|buzz|
+----+
推荐答案
尝试:
sqlContext.sparkContext.parallelize(rawData).toDF()
在 2.0 中,您可以:
In 2.0 you can:
import spark.implicits._
rawData.toDF
可选择为 toDF
提供一系列名称:
Optionally provide a sequence of names for toDF
:
sqlContext.sparkContext.parallelize(rawData).toDF("fizz")
这篇关于从单个字符串创建 Spark DataFrame的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!