我正在尝试将大文件从Google App Engine的Blobstore保存到Google Cloud Storage,以方便备份。
它适用于较小的文件(
我的代码:
PATH = '/gs/backupbucket/'
for df in DocumentFile.all():
fn = df.blob.filename
br = blobstore.BlobReader(df.blob)
write_path = files.gs.create(self.PATH+fn.encode('utf-8'), mime_type='application/zip',acl='project-private')
with files.open(write_path, 'a') as fp:
while True:
buf = br.read(100000)
if buf=="": break
fp.write(buf)
files.finalize(write_path)
(在任务队列中运行,以避免超过执行时间)。
引发FileNotOpenedError:
Traceback (most recent call last): File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 1511, in __call__ rv = self.handle_exception(request, response, e) File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 1505, in __call__ rv = self.router.dispatch(request, response) File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 1253, in default_dispatcher return route.handler_adapter(request, response) File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 1077, in __call__ return handler.dispatch() File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 547, in dispatch return self.handle_exception(e, self.app.debug) File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 545, in dispatch return method(*args, **kwargs) File "/base/data/home/apps/s~simplerepository/1.354754771592783168/processFiles.py", line 249, in post fp.write(buf) File "/base/python27_runtime/python27_lib/versions/1/google/appengine/api/files/file.py", line 281, in __exit__ self.close() File "/base/python27_runtime/python27_lib/versions/1/google/appengine/api/files/file.py", line 275, in close self._make_rpc_call_with_retry('Close', request, response) File "/base/python27_runtime/python27_lib/versions/1/google/appengine/api/files/file.py", line 388, in _make_rpc_call_with_retry _make_call(method, request, response) File "/base/python27_runtime/python27_lib/versions/1/google/appengine/api/files/file.py", line 236, in _make_call _raise_app_error(e) File "/base/python27_runtime/python27_lib/versions/1/google/appengine/api/files/file.py", line 179, in _raise_app_error raise FileNotOpenedError()
I have investigated further and according to a comment to GAE Issue 5371 the Files API closes the file every 30 seconds. I have not seen this documented anywhere else.
I have tried to work around this by closing and opening the file at intervals but now I get an WrongOpenModeError. The code below is edited from the first version of this post I have added a 0.5 second pause between the close and the open of the file. It now throws a WrongOpenModeError.
My code (updated):
PATH = '/gs/backupbucket/'
for df in DocumentFile.all():
fn = df.blob.filename
br = blobstore.BlobReader(df.blob)
write_path = files.gs.create(self.PATH+fn.encode('utf-8'), mime_type='application/zip',acl='project-private')
fp = files.open(write_path, 'a')
c = 0
while True:
if (c == 5):
c = 0
fp.close()
files.finalize(write_path)
time.sleep(0.5)
fp = files.open(write_path, 'a')
c = c + 1
buf = br.read(100000)
if buf=="": break
fp.write(buf)
files.finalize(write_path)
堆栈跟踪:
追溯(最近一次通话):
__call__中的文件“/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py”,第1511行
rv = self.handle_exception(请求,响应,e)
__call__中的文件“/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py”,行1505
rv = self.router.dispatch(请求,响应)
文件“/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py”,行1253,位于default_dispatcher中
返回route.handler_adapter(请求,响应)
__call__中的文件“/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py”,行1077
返回handler.dispatch()
在分派(dispatch)中的文件“/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py”,第547行
返回self.handle_exception(e,self.app.debug)
在分派(dispatch)中的文件“/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py”,第545行
返回方法(* args,** kwargs)
在get中的文件“/base/data/home/apps/s~simplerepository/1.354894420907462278/processFiles.py”,第267行
fp.write(buf)
写入文件“/base/python27_runtime/python27_lib/versions/1/google/appengine/api/files/file.py”,第310行
self._make_rpc_call_with_retry('追加',请求,响应)
_make_rpc_call_with_retry中的文件“/base/python27_runtime/python27_lib/versions/1/google/appengine/api/files/file.py”,行388
_make_call(方法,请求,响应)
_make_call中的文件“/base/python27_runtime/python27_lib/versions/1/google/appengine/api/files/file.py”,第236行
_raise_app_error(e)
文件_raise_app_error中的第188行「/base/python27_runtime/python27_lib/versions/1/google/appengine/api/files/file.py」
引发WrongOpenModeError()
我试图找到有关WrongOpenModeError的信息,但提到的唯一地方是在appengine.api.files.file.py本身中。
对于如何解决此问题以及将大型文件也保存到Google Cloud存储的建议,将不胜感激。谢谢!
最佳答案
我遇到了同样的问题,最终围绕获取数据编写了一个迭代器并捕获了异常,这是可行的,但可以解决。
重新编写您的代码将类似于:
from google.appengine.ext import blobstore
from google.appengine.api import files
def iter_blobstore(blob, fetch_size=524288):
start_index = 0
end_index = fetch_size
while True:
read = blobstore.fetch_data(blob, start_index, end_index)
if read == "":
break
start_index += fetch_size
end_index += fetch_size
yield read
PATH = '/gs/backupbucket/'
for df in DocumentFile.all():
fn = df.blob.filename
br = blobstore.BlobReader(df.blob)
write_path = files.gs.create(self.PATH+fn.encode('utf-8'), mime_type='application/zip',acl='project-private')
with files.open(write_path, 'a') as fp:
for buf in iter_blobstore(df.blob):
try:
fp.write(buf)
except files.FileNotOpenedError:
pass
files.finalize(write_path)
关于python - Google App Engine : How to write large files to Google Cloud Storage,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/8201283/