在Spark Scala中将DataFrame转换为HDFS

本文介绍了在Spark Scala中将DataFrame转换为HDFS的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个格式为org.apache.spark.sql.DataFrame = [user_key:string，field1:string]的spark数据帧.当我使用saveAsTextFile将文件保存在hdfs中时，结果看起来像[12345，xxxxx].我不想将开始和结束括号写入输出文件.如果我使用.rdd转换为RDD，则仍在RDD中使用方括号.

I have a spark data frame of the format org.apache.spark.sql.DataFrame = [user_key: string, field1: string]. When I use saveAsTextFile to save the file in hdfs results look like [12345,xxxxx]. I don't want the opening and closing bracket written to output file. if i used .rdd to convert into a RDD still the brackets are present in the RDD.

谢谢

推荐答案

只需连接值并存储字符串:

Just concatenate the values and store strings:

import org.apache.spark.sql.functions.{concat_ws, col}
import org.apache.spark.sql.Row

val expr = concat_ws(",", df.columns.map(col): _*)
df.select(expr).map(_.getString(0)).saveAsTextFile("some_path")

或者甚至更好地使用spark-csv:

selectedData.write
  .format("com.databricks.spark.csv")
  .option("header", "false")
  .save("some_path")

另一种方法是简单地map:

df.rdd.map(_.toSeq.map(_.toString).mkString(","))

然后保存.

这篇关于在Spark Scala中将DataFrame转换为HDFS的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！