问题描述
从GCS加载数据的建议方式是什么? 示例代码显示了复制数据从GCS到/tmp/
目录.如果这是建议的方法,则可以将多少数据复制到/tmp/
?
What is the suggest way of loading data from GCS? The sample code shows copying the data from GCS to the /tmp/
directory. If this is the suggest approach, how much data may be copied to /tmp/
?
推荐答案
尽管具有该选项,但无需将数据复制到本地磁盘.通过使用文件或对象的GCS URI来引用文件/对象,您应该能够直接从GCS引用培训和评估数据. gs://bucket/path/to/file.您可以在通常在接受文件路径的TensorFlow API中使用本地文件系统路径的地方使用这些路径. TensorFlow支持访问GCS数据(和向GCS写入数据)的功能.
While you have that option, you shouldn't need to copy the data over to local disk. You should be able to reference training and evaluation data directly from GCS, by referencing your files/objects using their GCS URI -- eg. gs://bucket/path/to/file. You can use these paths where you'd normally use local file system paths in TensorFlow APIs that accept file paths. TensorFlow supports the ability to access data (and write to) GCS.
您还应该能够使用前缀来引用一组匹配的文件,而不是分别引用每个文件.
You should also be able to use a prefix to reference a set of matching files, rather than referencing each file individually.
后续说明-您将要查看 https ://cloud.google.com/ml/docs/how-tos/using-external-buckets ,以防您需要对数据进行适当的ACL访问以进行培训.
Followup note -- you'll want to check out https://cloud.google.com/ml/docs/how-tos/using-external-buckets in case you need to appropriately ACL your data for being accessible to training.
希望有帮助.
这篇关于从GCS读取输入数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!