我创建了一个数据框,并使用pyarrow(也提到了here)将该df转换为一个实木复合地板文件:
def convert_df_to_parquet(self,df):
table = pa.Table.from_pandas(df)
buf = pa.BufferOutputStream()
pq.write_table(table, buf)
return buf
现在,我想将上传内容保存到s3存储桶,并尝试为
upload_file()
所尝试的所有内容输入不同的输入参数,但均不起作用:s3_client.upload_file(parquet_file, bucket_name, destination_key)#1st
s3_client.put_object(Bucket=bucket_name, Key=destination_key, Body=parquet_file)#2nd
s3_client.put_object(Bucket=bucket_name, Key=destination_key, Body=parquet_file.getvalue())#3rd
s3_client.put_object(Bucket=bucket_name, Key=destination_key, Body=parquet_file.read1())#4th
错误:
s3_client.put_object(Bucket=bucket_name, Key=destination_key, Body=parquet_file.read1())
File "pyarrow/io.pxi", line 376, in pyarrow.lib.NativeFile.read1
File "pyarrow/io.pxi", line 310, in pyarrow.lib.NativeFile.read
File "pyarrow/io.pxi", line 320, in pyarrow.lib.NativeFile.read
File "pyarrow/io.pxi", line 155, in pyarrow.lib.NativeFile.get_input_stream
File "pyarrow/io.pxi", line 170, in pyarrow.lib.NativeFile._assert_readable
OSError: only valid on readonly files
最佳答案
来自doc
您应该做类似的事情,
import boto3
s3 = boto3.resource('s3')
s3.meta.client.upload_file('/tmp/'+parquet_file, bucket_name, parquet_file)
关于python - 如何将pyarrow Parquet 数据写入s3存储桶?,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/58818227/