本文介绍了如何将DataFrame保存为压缩(压缩)的CSV?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我使用Spark 1.6.0和Scala.
I use Spark 1.6.0 and Scala.
我想将DataFrame保存为压缩的CSV格式.
I want to save a DataFrame as compressed CSV format.
这是我到目前为止所拥有的(假设我已经将df
和sc
作为SparkContext
):
Here is what I have so far (assume I already have df
and sc
as SparkContext
):
//set the conf to the codec I want
sc.getConf.set("spark.hadoop.mapred.output.compress", "true")
sc.getConf.set("spark.hadoop.mapred.output.compression.codec", "true")
sc.getConf.set("spark.hadoop.mapred.output.compression.codec", "org.apache.hadoop.io.compress.GzipCodec")
sc.getConf.set("spark.hadoop.mapred.output.compression.type", "BLOCK")
df.write
.format("com.databricks.spark.csv")
.save(my_directory)
输出不是gz
格式.
推荐答案
在spark-csv github上: https://github.com/databricks/spark-csv
On the spark-csv github:https://github.com/databricks/spark-csv
一个人可以阅读:
在您的情况下,这应该可以工作:df.write.format("com.databricks.spark.csv").codec("gzip")\.save('my_directory/my_file.gzip')
In your case, this should work:df.write.format("com.databricks.spark.csv").codec("gzip")\.save('my_directory/my_file.gzip')
这篇关于如何将DataFrame保存为压缩(压缩)的CSV?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!