问题描述
有没有简单的方法可以将给定的Row对象转换为json?
Is there a simple way to converting a given Row object to json?
发现了有关将整个Dataframe转换为json输出的信息:将行闪烁到JSON
Found this about converting a whole Dataframe to json output:Spark Row to JSON
但是我只想将一个Row转换为json.这是我要执行的操作的伪代码.
But I just want to convert a one Row to json.Here is pseudo code for what I am trying to do.
更准确地说,我正在读取json作为Dataframe中的输入.我正在产生主要基于列的新输出,但对于不适合列的所有信息都具有一个json字段.
More precisely I am reading json as input in a Dataframe.I am producing a new output that is mainly based on columns, but with one json field for all the info that does not fit into the columns.
我的问题是编写此函数最简单的方法是什么:convertRowToJson()
My question what is the easiest way to write this function: convertRowToJson()
def convertRowToJson(row: Row): String = ???
def transformVenueTry(row: Row): Try[Venue] = {
Try({
val name = row.getString(row.fieldIndex("name"))
val metadataRow = row.getStruct(row.fieldIndex("meta"))
val score: Double = calcScore(row)
val combinedRow: Row = metadataRow ++ ("score" -> score)
val jsonString: String = convertRowToJson(combinedRow)
Venue(name = name, json = jsonString)
})
}
Psidom解决方案:
Psidom's Solutions:
def convertRowToJSON(row: Row): String = {
val m = row.getValuesMap(row.schema.fieldNames)
JSONObject(m).toString()
}
仅在行只有一个级别而不与嵌套行一起使用时才起作用.这是模式:
only works if the Row only has one level not with nested Row. This is the schema:
StructType(
StructField(indicator,StringType,true),
StructField(range,
StructType(
StructField(currency_code,StringType,true),
StructField(maxrate,LongType,true),
StructField(minrate,LongType,true)),true))
还尝试了Artem的建议,但是没有编译:
Also tried Artem suggestion, but that did not compile:
def row2DataFrame(row: Row, sqlContext: SQLContext): DataFrame = {
val sparkContext = sqlContext.sparkContext
import sparkContext._
import sqlContext.implicits._
import sqlContext._
val rowRDD: RDD[Row] = sqlContext.sparkContext.makeRDD(row :: Nil)
val dataFrame = rowRDD.toDF() //XXX does not compile
dataFrame
}
推荐答案
我需要阅读json输入并产生json输出.大多数字段都是单独处理的,但只需保留几个json子对象.
I need to read json input and produce json output.Most fields are handled individually, but a few json sub objects need to just be preserved.
Spark读取数据帧时,会将记录变成行.行是一个类似于json的结构.可以将其转换并写出为json.
When Spark reads a dataframe it turns a record into a Row. The Row is a json like structure. That can be transformed and written out to json.
但是我需要将一些子json结构带出一个字符串,以用作新字段.
But I need to take some sub json structures out to a string to use as a new field.
这可以像这样完成:
dataFrameWithJsonField = dataFrame.withColumn("address_json", to_json($"location.address"))
location.address
是传入的基于json的数据帧的子json对象的路径. address_json
是该对象的列名,转换为json的字符串版本.
location.address
is the path to the sub json object of the incoming json based dataframe. address_json
is the column name of that object converted to a string version of the json.
to_json
在Spark 2.1中实现.
to_json
is implemented in Spark 2.1.
如果使用json4s生成输出json,则应将address_json解析为AST表示形式,否则输出json将转义address_json部分.
If generating it output json using json4s address_json should be parsed to an AST representation otherwise the output json will have the address_json part escaped.
这篇关于如何在Spark 2 Scala中将Row转换为json的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!