问题描述
我想以镶木地板格式在我的 s3 存储桶中写入我的数据帧.我知道如何以 csv 格式编写数据帧.但我不知道如何以镶木地板格式书写.这是 csv 格式的代码(我不显示字段 ServerSideEncryption 和 SSEKMSKeyId 但我在实际代码中使用它们):
I want to write my dataframe in my s3 bucket in a parquet format.I know how to write the dataframe in a csv format. But I don't know how to write in parquet format.Here is the code for the csv format (I don't display the fields ServerSideEncryption and SSEKMSKeyId but I use them in my actual code ) :
csv_to_write = df.to_csv(None).encode()
s3_client.put_object(Bucket=bucket_name,Key='data.csv', Body=csv_to_write,
ServerSideEncryption='XXXXX', SSEKMSKeyId='XXXXXXXX')
有人有镶木地板的等价物吗?谢谢
Does someone have the equivalent for parquet ?Thanks
推荐答案
对于 python 3.6+,AWS 有一个名为 aws-data-wrangler 有助于 Pandas/S3/Parquet 之间的集成
For python 3.6+, AWS has a library called aws-data-wrangler that helps with the integration between Pandas/S3/Parquet
安装做;
pip install awswrangler
如果您想将 Pandas 数据框作为镶木地板文件写入 S3,请执行;
if you want to write your pandas dataframe as a parquet file to S3 do;
import awswrangler as wr
wr.s3.to_parquet(
dataframe=df,
path="s3://my-bucket/key/my-file.parquet"
)
如果你想添加加密做;
import awswrangler as wr
extra_args = {
"ServerSideEncryption": "aws:kms",
"SSEKMSKeyId": "YOUR_KMS_KEY_ARN"
}
sess = wr.Session(s3_additional_kwargs=extra_args)
sess.s3.to_parquet(
dataframe=df,
path="s3://my-bucket/key/my-file.parquet"
)
这篇关于将 Pandas 数据帧写入 s3 AWS 中的镶木地板的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!