本文介绍了如何指定saveAsTable将文件保存到的路径?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正在尝试使用DataFrameWriter将DataFrame保存到Spark1.4中的pyspark中的S3
I am trying to save a DataFrame to S3 in pyspark in Spark1.4 using DataFrameWriter
df = sqlContext.read.format("json").load("s3a://somefile")
df_writer = pyspark.sql.DataFrameWriter(df)
df_writer.partitionBy('col1')\
.saveAsTable('test_table', format='parquet', mode='overwrite')
实木复合地板文件转到驱动程序上本地tmp目录"/tmp/hive/warehouse/....".
The parquet files went to "/tmp/hive/warehouse/...." which is a local tmp directory on my driver.
我确实在hive-site.xml中将hive.metastore.warehouse.dir设置为"s3a://...."位置,但是火花似乎并不符合我的蜂巢仓库设置.
I did setup hive.metastore.warehouse.dir in hive-site.xml to a "s3a://...." location, but spark doesn't seem to respect to my hive warehouse setting.
推荐答案
使用 path
.
df_writer.partitionBy('col1')\
.saveAsTable('test_table', format='parquet', mode='overwrite',
path='s3a://bucket/foo')
这篇关于如何指定saveAsTable将文件保存到的路径?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!