在存储桶中列出特定目录的内容

在存储桶中列出特定目录的内容

本文介绍了Python boto,在存储桶中列出特定目录的内容的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我只能通过S3访问S3存储桶中的特定目录.

I have S3 access only to a specific directory in an S3 bucket.

例如,如果我尝试列出整个存储桶,请使用s3cmd命令:

For example, with the s3cmd command if I try to list the whole bucket:

    $ s3cmd ls s3://bucket-name

我收到一个错误:Access to bucket 'my-bucket-url' was denied

但是,如果我尝试访问存储桶中的特定目录,则可以看到其中的内容:

But if I try access a specific directory in the bucket, I can see the contents:

    $ s3cmd ls s3://bucket-name/dir-in-bucket

现在,我想使用python boto连接到S3存储桶.类似于:

Now I want to connect to the S3 bucket with python boto. Similary with:

    bucket = conn.get_bucket('bucket-name')

我收到一个错误:boto.exception.S3ResponseError: S3ResponseError: 403 Forbidden

但是,如果我尝试:

    bucket = conn.get_bucket('bucket-name/dir-in-bucket')

脚本停顿约10秒钟,然后打印出错误.波纹管是完整的痕迹.知道如何进行此操作吗?

The script stalls for about 10 seconds, and prints out an error afterwards. Bellow is the full trace. Any idea how to proceed with this?

注意问题是关于boto版本2模块,而不是boto3.

Note question is about the boto version 2 module, not boto3.

Traceback (most recent call last):
  File "test_s3.py", line 7, in <module>
    bucket = conn.get_bucket('bucket-name/dir-name')
  File "/usr/local/lib/python2.7/dist-packages/boto/s3/connection.py", line 471, in get_bucket
    return self.head_bucket(bucket_name, headers=headers)
  File "/usr/local/lib/python2.7/dist-packages/boto/s3/connection.py", line 490, in head_bucket
    response = self.make_request('HEAD', bucket_name, headers=headers)
  File "/usr/local/lib/python2.7/dist-packages/boto/s3/connection.py", line 633, in make_request
    retry_handler=retry_handler
  File "/usr/local/lib/python2.7/dist-packages/boto/connection.py", line 1046, in make_request
    retry_handler=retry_handler)
  File "/usr/local/lib/python2.7/dist-packages/boto/connection.py", line 922, in _mexe
    request.body, request.headers)
  File "/usr/lib/python2.7/httplib.py", line 958, in request
    self._send_request(method, url, body, headers)
  File "/usr/lib/python2.7/httplib.py", line 992, in _send_request
    self.endheaders(body)
  File "/usr/lib/python2.7/httplib.py", line 954, in endheaders
    self._send_output(message_body)
  File "/usr/lib/python2.7/httplib.py", line 814, in _send_output
    self.send(msg)
  File "/usr/lib/python2.7/httplib.py", line 776, in send
    self.connect()
  File "/usr/lib/python2.7/httplib.py", line 1157, in connect
    self.timeout, self.source_address)
  File "/usr/lib/python2.7/socket.py", line 553, in create_connection
    for res in getaddrinfo(host, port, 0, SOCK_STREAM):
socket.gaierror: [Errno -2] Name or service not known

推荐答案

默认情况下,当您在boto中执行get_bucket调用时,它会尝试通过对执行HEAD请求来验证您是否确实有权访问该存储桶值区网址.在这种情况下,您不希望Boto那样做,因为您无权访问存储桶本身.因此,请执行以下操作:

By default, when you do a get_bucket call in boto it tries to validate that you actually have access to that bucket by performing a HEAD request on the bucket URL. In this case, you don't want boto to do that since you don't have access to the bucket itself. So, do this:

bucket = conn.get_bucket('my-bucket-url', validate=False)

然后您应该能够执行以下操作以列出对象:

and then you should be able to do something like this to list objects:

for key in bucket.list(prefix='dir-in-bucket'):
    <do something>

如果仍然收到403 Errror,请尝试在前缀末尾添加斜杠.

If you still get a 403 Errror, try adding a slash at the end of the prefix.

for key in bucket.list(prefix='dir-in-bucket/'):
    <do something>

注意:此答案是关于boto版本2模块的,该模块目前已过时.目前(2020年),boto3是用于AWS的标准模块.有关更多信息,请参见此问题: AWS boto和boto3

Note: this answer was written about the boto version 2 module, which is obsolete by now. At the moment (2020), boto3 is the standard module for working with AWS. See this question for more info: What is the difference between the AWS boto and boto3

这篇关于Python boto,在存储桶中列出特定目录的内容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-05 22:50