session和自定义模块导入错误

session和自定义模块导入错误

本文介绍了Google DataFlow/Python:使用__main__中的save_main_session和自定义模块导入错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有人可以阐明使用save_main_session和在__main__中导入的自定义模块时的预期行为.我的DataFlow管道导入2个非标准模块-一个通过requirements.txt,另一个通过setup_file.除非将导入移到使用它们的函数中,否则我将不断收到导入/处理错误.示例错误如下.从文档中,我认为设置save_main_session将有助于解决此问题,但并不能解决问题(请参见下面的错误).所以我想知道我是否错过了某些事情,或者这种行为是设计使然.将相同的导入放置到函数中后,效果很好.

Could somebody please clarify the expected behavior when using save_main_session and custom modules imported in __main__. My DataFlow pipeline imports 2 non-standard modules - one via requirements.txt and another one via setup_file. Unless I move the imports into the functions where they get used I keep getting import/pickling errors. Sample error is below. From the documentation, I assumed that setting save_main_session would help to solve this problem, but it does not (see error below). So I wonder if I missed something or this behavior is by design. The same import works fine when placed into a function.

错误:


  File "/usr/lib/python2.7/pickle.py", line 1130, in find_class
    __import__(module)
ImportError: No module named jmespath

推荐答案

https://cloud.google.com/dataflow/faq#how-do-i-handle-nameerrors "> https://beam.apache.org/documentation/sdks/python-pipeline-dependencies/

何时使用--save_main_session:

最适合我的设置是将dataflow_launcher.py与您的setup.py放在项目根目录下.它唯一要做的就是导入管道文件并启动它.使用setup.py处理所有依赖项.这是到目前为止我发现的最好的例子.

The setup that best works for me is having a dataflow_launcher.py sitting at the project root with your setup.py. The only thing it does is import your pipeline file and launch it. Use setup.py to handle all your dependencies. This is the best example I've found so far.

https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/complete/juliaset

这篇关于Google DataFlow/Python:使用__main__中的save_main_session和自定义模块导入错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-18 12:25