问题描述
我有一个大型CSV文件,大小为1 GB,想要在数据存储区中创建实体,每行一个实体。
I have a large CSV file, on the order of 1 GB big, and want to create entities into the datastore, one entity per row.
该CSV文件目前位于Google Cloud Storage中。有没有干净的方式来做到这一点?所有的例子,我可以在网上找到似乎依赖于CSV文件在本地,或者看起来不像他们会扩展非常好。理想情况下,有一个流API可以让我从云存储中读取足够多的数据来更新对数据存储的调用,但我还没有找到类似的东西。
That CSV file is currently residing in Google Cloud Storage. Is there a clean way to do this? All the examples I can find online seem to rely on having the CSV file locally, or don't look like they would scale very well. Ideally there's a streaming API that lets me read in small enough pieces from Cloud Storage to make update calls to the Datastore, but I haven't been able to find anything like that.
推荐答案
打开GCS文件时收到的缓冲区是一个流缓冲区,可以进行酸洗。但是GCS不支持迭代器协议来读取CSV的行。您必须。 like:
The buffer you receive when you open a GCS file is a streaming buffer, which can be pickled. But GCS does not support the iterator protocol to read lines of the CSV. You have to write your own wrapper. Like:
with gcs.open('/app_default_bucket/csv/example.csv', 'r') as f:
csv_reader = csv.reader(iter(f.readline, ''))
for row in csv_reader:
logging.info(' - '.join(row))
如果你是blobstore的familiair,你可以使用它从GCS使用 blobstore读取大型CSV。 create_gs_key(/ gs+< gcs_file_name_here>)
。
示例
If you are familiair with the blobstore you can use it to read large CSV's from GCS using blobstore.create_gs_key( "/gs" + <gcs_file_name_here>)
.Example here
这篇关于将大型CSV从云存储导入到App Engine数据存储的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!