本文介绍了将 Pandas 数据帧写入 s3 AWS 中的镶木地板的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想以镶木地板格式在我的 s3 存储桶中写入我的数据帧.我知道如何以 csv 格式编写数据帧.但我不知道如何以镶木地板格式书写.这是 csv 格式的代码(我不显示字段 ServerSideEncryption 和 SSEKMSKeyId 但我在实际代码中使用它们):

I want to write my dataframe in my s3 bucket in a parquet format.I know how to write the dataframe in a csv format. But I don't know how to write in parquet format.Here is the code for the csv format (I don't display the fields ServerSideEncryption and SSEKMSKeyId but I use them in my actual code ) :

csv_to_write = df.to_csv(None).encode()
s3_client.put_object(Bucket=bucket_name,Key='data.csv', Body=csv_to_write,
              ServerSideEncryption='XXXXX', SSEKMSKeyId='XXXXXXXX')

有人有镶木地板的等价物吗?谢谢

Does someone have the equivalent for parquet ?Thanks

推荐答案

对于 python 3.6+,AWS 有一个名为 aws-data-wrangler 有助于 Pandas/S3/Parquet 之间的集成

For python 3.6+, AWS has a library called aws-data-wrangler that helps with the integration between Pandas/S3/Parquet

安装做;

pip install awswrangler

如果您想将 Pandas 数据框作为镶木地板文件写入 S3,请执行;

if you want to write your pandas dataframe as a parquet file to S3 do;

import awswrangler as wr
wr.s3.to_parquet(
    dataframe=df,
    path="s3://my-bucket/key/my-file.parquet"
)

如果你想添加加密做;

import awswrangler as wr
extra_args = {
    "ServerSideEncryption": "aws:kms",
    "SSEKMSKeyId": "YOUR_KMS_KEY_ARN"
}
sess = wr.Session(s3_additional_kwargs=extra_args)
sess.s3.to_parquet(
    dataframe=df,
    path="s3://my-bucket/key/my-file.parquet"
)

这篇关于将 Pandas 数据帧写入 s3 AWS 中的镶木地板的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-11 17:44