问题描述
为Python 3创建TensorFlow 1.4模型后,我现在发现Google Cloud ML Engine当前仅支持Python 2.7.
After having created a TensorFlow 1.4 model for Python 3, I have now found that Google Cloud ML Engine currently only has support for Python 2.7.
起初向后移植我的Python 3代码似乎很简单:当我用#!/usr/bin/env python
替换它们的shebang #!/usr/bin/env python3
时,某些脚本仍然可以按预期工作. python -V
在我的(macOS)环境中报告2.7.10
.
Back-porting my Python 3 code at first seemed simple enough: Some scripts still work as expected when I replace their shebang #!/usr/bin/env python3
with #!/usr/bin/env python
. python -V
reports 2.7.10
in my (macOS) environment.
然而,一个脚本却没有如此优雅地反应.我现在运行它时,它会生成一个Segmentation fault: 11
,而没有任何先前的警告或其他诊断输出.
Yet one script does not react so gracefully. When I run it now, it produces a Segmentation fault: 11
without any previous warnings or other diagnostic output.
我如何找到根本原因,以便我知道还需要进行哪些更改以使该脚本也适合Python 2?
How can I find out about the root cause, so that I know what else to change to make also that script palatable to Python 2?
更新分段错误显然是在调用session.run(get_next)
的过程中发生的,其中从tf.data.Iterator
获取get_next
的方式如下:
UPDATE The segmentation fault apparently occurs during a call to session.run(get_next)
, where get_next
is obtained from a tf.data.Iterator
as follows:
iterator = dataset.make_initializable_iterator()
get_next = iterator.get_next()
推荐答案
这里有两个问题:一个是关于Python 3的支持,另一个是关于segfault的问题.
There are two issues here: one is about Python 3 support and the other is about the segfault.
Python 3支持现在,CloudML Engine在提交作业时通过'pythonVersion'字段支持Python 3(请参见 API参考文档).
Python 3 SupportCloudML Engine now supports Python 3, via the 'pythonVersion' field when submitting jobs (see the API reference docs).
如果您使用的是gcloud
,则需要创建一个这样的配置文件(将其命名为config.yaml
):
If you are using gcloud
you will need to create a config file like this (let's name it config.yaml
):
trainingInput:
pythonVersion: "3.5"
提交作业时,将gcloud
指向该文件,例如
When you submit your job, point gcloud
to that file, e.g.
gcloud ml-engine jobs submit training --config=config.yaml ...
Segfault 这可能是由于内存不足而引起的.请在控制台中检查该作业的内存使用情况.也就是说,如果作业突然终止,则可能无法准确反映该作业在发生故障时的内存使用情况.
SegfaultThis may be caused by running out of memory. Please check the memory usage in the console for that job. That said, if the job dies abruptly, memory usage at the time of failure may not be accurately reflected for that job.
这篇关于分段错误:将TensorFlow脚本从Python 3反向移植到Python 2后出现11的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!