Scala中将Row转换为json

Scala中将Row转换为json

本文介绍了如何在Spark 2 Scala中将Row转换为json的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

限时删除!!

有没有简单的方法可以将给定的Row对象转换为json?

Is there a simple way to converting a given Row object to json?

发现了有关将整个Dataframe转换为json输出的信息:将行闪烁到JSON

Found this about converting a whole Dataframe to json output:Spark Row to JSON

但是我只想将一个Row转换为json.这是我要执行的操作的伪代码.

But I just want to convert a one Row to json.Here is pseudo code for what I am trying to do.

更准确地说,我正在读取json作为Dataframe中的输入.我正在产生主要基于列的新输出,但对于不适合列的所有信息都具有一个json字段.

More precisely I am reading json as input in a Dataframe.I am producing a new output that is mainly based on columns, but with one json field for all the info that does not fit into the columns.

我的问题是编写此函数最简单的方法是什么:convertRowToJson()

My question what is the easiest way to write this function: convertRowToJson()

def convertRowToJson(row: Row): String = ???

def transformVenueTry(row: Row): Try[Venue] = {
  Try({
    val name = row.getString(row.fieldIndex("name"))
    val metadataRow = row.getStruct(row.fieldIndex("meta"))
    val score: Double = calcScore(row)
    val combinedRow: Row = metadataRow ++ ("score" -> score)
    val jsonString: String = convertRowToJson(combinedRow)
    Venue(name = name, json = jsonString)
  })
}

Psidom解决方案:

Psidom's Solutions:

def convertRowToJSON(row: Row): String = {
    val m = row.getValuesMap(row.schema.fieldNames)
    JSONObject(m).toString()
}

仅在行只有一个级别而不与嵌套行一起使用时才起作用.这是模式:

only works if the Row only has one level not with nested Row. This is the schema:

StructType(
    StructField(indicator,StringType,true),
    StructField(range,
    StructType(
        StructField(currency_code,StringType,true),
        StructField(maxrate,LongType,true),
        StructField(minrate,LongType,true)),true))

还尝试了Artem的建议,但是没有编译:

Also tried Artem suggestion, but that did not compile:

def row2DataFrame(row: Row, sqlContext: SQLContext): DataFrame = {
  val sparkContext = sqlContext.sparkContext
  import sparkContext._
  import sqlContext.implicits._
  import sqlContext._
  val rowRDD: RDD[Row] = sqlContext.sparkContext.makeRDD(row :: Nil)
  val dataFrame = rowRDD.toDF() //XXX does not compile
  dataFrame
}

推荐答案

我需要阅读json输入并产生json输出.大多数字段都是单独处理的,但只需保留几个json子对象.

I need to read json input and produce json output.Most fields are handled individually, but a few json sub objects need to just be preserved.

Spark读取数据帧时,会将记录变成行.行是一个类似于json的结构.可以将其转换并写出为json.

When Spark reads a dataframe it turns a record into a Row. The Row is a json like structure. That can be transformed and written out to json.

但是我需要将一些子json结构带出一个字符串,以用作新字段.

But I need to take some sub json structures out to a string to use as a new field.

这可以像这样完成:

dataFrameWithJsonField = dataFrame.withColumn("address_json", to_json($"location.address"))

location.address是传入的基于json的数据帧的子json对象的路径. address_json是该对象的列名,转换为json的字符串版本.

location.address is the path to the sub json object of the incoming json based dataframe. address_json is the column name of that object converted to a string version of the json.

to_json在Spark 2.1中实现.

to_json is implemented in Spark 2.1.

如果使用json4s生成输出json,则应将address_json解析为AST表示形式,否则输出json将转义address_json部分.

If generating it output json using json4s address_json should be parsed to an AST representation otherwise the output json will have the address_json part escaped.

这篇关于如何在Spark 2 Scala中将Row转换为json的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

1403页,肝出来的..

09-06 22:47