问题描述
最近,在运行以Python编写的数据流作业时,我一直收到此错误.问题是它曾经可以正常工作,并且代码没有更改,因此我认为它与env有关.
Recently I've been getting this error when running dataflow jobs written in Python. The thing is it used to work and no code has changed so I'm thinking it's got something to do with the env.
有人可以帮我吗?
推荐答案
在我的情况下,我使用的Apache Beam SDK版本2.9.0存在相同的问题.
In my case, I was using Apache Beam SDK version 2.9.0 had the same problem.
我使用了setup.py
,并且通过加载requirements.txt
文件的内容来动态填充设置字段" install_requires ".可以,如果您使用DirectRunner
,但是DataflowRunner
对于本地文件的依赖关系过于敏感,因此放弃该技术并将依赖关系从requirements.txt
硬编码为"install_requires"对我来说解决了一个问题.
I used setup.py
and set-up field "install_requires" was filled dynamically by loading content of requirements.txt
file. It’s okay if you’re using DirectRunner
but DataflowRunner
is too sensitive for dependencies on local files, so abandoning that technique and hard-coding dependencies from requirements.txt
into "install_requires" solved an issue for me.
如果您坚持这样做,请尝试调查您的依赖关系并将其最小化.请参阅管理Python管道依赖项文档主题以获取帮助.避免在本地文件系统上使用复杂或嵌套的代码结构或依赖项.
If you stuck on that try to investigate your dependencies and minimize them as much as you can. Please refer to the Managing Python Pipeline Dependencies documentation topic for help. Avoid using complex or nested code-structures or dependencies on the local filesystem.
这篇关于Google Cloud Dataflow卡住了的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!