本文介绍了将大型CSV从云存储导入到App Engine数据存储的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个大型CSV文件,大小为1 GB,想要在数据存储区中创建实体,每行一个实体。

I have a large CSV file, on the order of 1 GB big, and want to create entities into the datastore, one entity per row.

该CSV文件目前位于Google Cloud Storage中。有没有干净的方式来做到这一点?所有的例子,我可以在网上找到似乎依赖于CSV文件在本地,或者看起来不像他们会扩展非常好。理想情况下,有一个流API可以让我从云存储中读取足够多的数据来更新对数据存储的调用,但我还没有找到类似的东西。

That CSV file is currently residing in Google Cloud Storage. Is there a clean way to do this? All the examples I can find online seem to rely on having the CSV file locally, or don't look like they would scale very well. Ideally there's a streaming API that lets me read in small enough pieces from Cloud Storage to make update calls to the Datastore, but I haven't been able to find anything like that.

推荐答案

打开GCS文件时收到的缓冲区是一个流缓冲区,可以进行酸洗。但是GCS不支持迭代器协议来读取CSV的行。您必须。 like:

The buffer you receive when you open a GCS file is a streaming buffer, which can be pickled. But GCS does not support the iterator protocol to read lines of the CSV. You have to write your own wrapper. Like:

with gcs.open('/app_default_bucket/csv/example.csv', 'r') as f:
        csv_reader = csv.reader(iter(f.readline, ''))
        for row in csv_reader:
            logging.info(' - '.join(row))

如果你是blobstore的familiair,你可以使用它从GCS使用 blobstore读取大型CSV。 create_gs_key(/ gs+< gcs_file_name_here>)
示例

If you are familiair with the blobstore you can use it to read large CSV's from GCS using blobstore.create_gs_key( "/gs" + <gcs_file_name_here>).Example here

这篇关于将大型CSV从云存储导入到App Engine数据存储的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-05 21:19