python - 使用python2.7从Amazon s3读取csv

我可以轻松地从s3获取存储桶名称，但是当我从s3读取csv文件时，每次都会出错。

import boto3
import pandas as pd

s3 = boto3.client('s3',
         aws_access_key_id='yyyyyyyy',
         aws_secret_access_key='xxxxxxxxxxx')
# Call S3 to list current buckets
response = s3.list_buckets()
for bucket in response['Buckets']:
    print bucket['Name']

output
s3-bucket-data

。

import pandas as pd
import StringIO
from boto.s3.connection import S3Connection

AWS_KEY = 'yyyyyyyyyy'
AWS_SECRET = 'xxxxxxxxxx'
aws_connection = S3Connection(AWS_KEY, AWS_SECRET)
bucket = aws_connection.get_bucket('s3-bucket-data')

fileName = "data.csv"

content = bucket.get_key(fileName).get_contents_as_string()
reader = pd.read_csv(StringIO.StringIO(content))

出现错误-

boto.exception.S3ResponseError: S3ResponseError: 400 Bad Request

我如何从s3中读取csv？

最佳答案

你可以使用s3fs包

s3fs还支持凭证文件中的aws配置文件。

这是一个示例(您不必对其进行分块，但是我只用了这个示例)，

import os
import pandas as pd
import s3fs
import gzip

chunksize = 999999
usecols = ["Col1", "Col2"]

filename = 'some_csv_file.csv.gz'
s3_bucket_name = 'some_bucket_name'

AWS_KEY = 'yyyyyyyyyy'
AWS_SECRET = 'xxxxxxxxxx'
s3f = s3fs.S3FileSystem(
    anon=False,
    key=AWS_KEY,
    secret=AWS_SECRET)

# or if you have a profile defined in credentials file:
#aws_shared_credentials_file = 'path/to/aws/credentials/file/'
#os.environ['AWS_SHARED_CREDENTIALS_FILE'] = aws_shared_credentials_file
#s3f = s3fs.S3FileSystem(
#    anon=False,
#    profile_name=s3_profile)

filepath = os.path.join(s3_bucket_name, filename)
with s3f.open(filepath, 'rb') as f:
    gz = gzip.GzipFile(fileobj=f)  # Decompress data with gzip

    chunks = pd.read_csv(gz,
                            usecols=usecols,
                            chunksize=chunksize,
                            iterator=True,
                            )

    df = pd.concat([c for c in chunks], axis=1)

关于python - 使用python2.7从Amazon s3读取csv，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/43345907/

读取CSV

python - 使用python2.7从Amazon s3读取csv