问题描述
有人可以阐明使用save_main_session
和在__main__
中导入的自定义模块时的预期行为.我的DataFlow管道导入2个非标准模块-一个通过requirements.txt
,另一个通过setup_file
.除非将导入移到使用它们的函数中,否则我将不断收到导入/处理错误.示例错误如下.从文档中,我认为设置save_main_session
将有助于解决此问题,但并不能解决问题(请参见下面的错误).所以我想知道我是否错过了某些事情,或者这种行为是设计使然.将相同的导入放置到函数中后,效果很好.
Could somebody please clarify the expected behavior when using save_main_session
and custom modules imported in __main__
. My DataFlow pipeline imports 2 non-standard modules - one via requirements.txt
and another one via setup_file
. Unless I move the imports into the functions where they get used I keep getting import/pickling errors. Sample error is below. From the documentation, I assumed that setting save_main_session
would help to solve this problem, but it does not (see error below). So I wonder if I missed something or this behavior is by design. The same import works fine when placed into a function.
错误:
File "/usr/lib/python2.7/pickle.py", line 1130, in find_class
__import__(module)
ImportError: No module named jmespath
推荐答案
https://cloud.google.com/dataflow/faq#how-do-i-handle-nameerrors "> https://beam.apache.org/documentation/sdks/python-pipeline-dependencies/
何时使用--save_main_session
:
最适合我的设置是将dataflow_launcher.py
与您的setup.py
放在项目根目录下.它唯一要做的就是导入管道文件并启动它.使用setup.py
处理所有依赖项.这是到目前为止我发现的最好的例子.
The setup that best works for me is having a dataflow_launcher.py
sitting at the project root with your setup.py
. The only thing it does is import your pipeline file and launch it. Use setup.py
to handle all your dependencies. This is the best example I've found so far.
https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/complete/juliaset
这篇关于Google DataFlow/Python:使用__main__中的save_main_session和自定义模块导入错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!