问题描述
我试图通过任务队列和20个后端实例将相当大量的数据从GCS迁移到AppEngine。问题是新的云存储库似乎并不尊重urlfetch超时或其他事情。
import cloudstorage as gcs
gcs.set_default_retry_params(gcs.RetryParams(urlfetch_timeout = 60,
max_retry_period = 300))
...
with gcs.open(fn,'r')作为fp:
raw_gcs_file = fp.read()
队列被暂停,并且我一次运行一个任务,但是当我尝试对20个后端运行20个并发任务时,会发生以下情况:
我2013-07-20 00:18:16.418与GCS联系时有异常。将在0.2秒内重试。
I 2013-07-20 00:18:16.418无法取得网址:https://storage.googleapis.com/< remmoved>
I 2013-07-20 00:18:21.553与GCS联系时有异常。将在0.4秒内重试。
I 2013-07-20 00:18:21.554无法取得网址:https://storage.googleapis.com/< remmoved>
I 2013-07-20 00:18:25.728与GCS联系时有异常。将在0.8秒内重试。
I 2013-07-20 00:18:25.728无法取得网址:https://storage.googleapis.com/< remmoved>
I 2013-07-20 00:18:31.428联系GCS时有异常。将在1.6秒内重试。
I 2013-07-20 00:18:31.428无法取得网址:https://storage.googleapis.com/< remmoved>
I 2013-07-20 00:18:34.301与GCS联系时发生异常。将在-1秒内重试。
I 2013-07-20 00:18:34.301无法取得网址:https://storage.googleapis.com/< remmoved>
I 2013-07-20 00:18:34.301 Urlfetch重试5失败后22.8741798401秒总计
这是gcs客户端库中的一个错误。它会很快修复。谢谢!
您的黑客将会工作。但是如果它仍然经常超时,你可以尝试做fp.read(size = some_size)。如果您的文件很大,32 MB响应(URLfetch响应大小限制)和90秒截止日期,我们假设传输速率为364KB / s。
I am trying to migrate a fairly large amount of data from GCS to AppEngine via the task queue and 20 backend instances. The issue is that the new Cloud Storage library does not seem to respect the urlfetch timeout, or something else is going on.
import cloudstorage as gcs
gcs.set_default_retry_params(gcs.RetryParams(urlfetch_timeout=60,
max_retry_period=300))
...
with gcs.open(fn, 'r') as fp:
raw_gcs_file = fp.read()
So the following works just fine when the queue is paused, and I run one task at a time, but when I try to run 20 concurrent tasks against the 20 backends the following starts happening:
I 2013-07-20 00:18:16.418 Got exception while contacting GCS. Will retry in 0.2 seconds.
I 2013-07-20 00:18:16.418 Unable to fetch URL: https://storage.googleapis.com/<removed>
I 2013-07-20 00:18:21.553 Got exception while contacting GCS. Will retry in 0.4 seconds.
I 2013-07-20 00:18:21.554 Unable to fetch URL: https://storage.googleapis.com/<removed>
I 2013-07-20 00:18:25.728 Got exception while contacting GCS. Will retry in 0.8 seconds.
I 2013-07-20 00:18:25.728 Unable to fetch URL: https://storage.googleapis.com/<removed>
I 2013-07-20 00:18:31.428 Got exception while contacting GCS. Will retry in 1.6 seconds.
I 2013-07-20 00:18:31.428 Unable to fetch URL: https://storage.googleapis.com/<removed>
I 2013-07-20 00:18:34.301 Got exception while contacting GCS. Will retry in -1 seconds.
I 2013-07-20 00:18:34.301 Unable to fetch URL: https://storage.googleapis.com/<removed>
I 2013-07-20 00:18:34.301 Urlfetch retry 5 failed after 22.8741798401 seconds total
How can it fail after only 22 seconds? It doesn't seem to be using the retry params at all.
This is a bug in gcs client library. It will be fixed soon. Thanks!
You hack will work. But if it still times out frequently, you can try to do fp.read(size=some_size). If you files are large, with a 32 MB response (URLfetch response size limit) and 90 seconds deadline, we assume a transfer rate of 364KB/s.
这篇关于AppEngine云存储Python库超时问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!