问题描述
我们正在使用Google Dataflow进行批处理数据处理,并为工作流程编排工具寻找一些类似于Azkaban为Hadoop所做的工具。
关键在于我们正在寻找的是
$ b $ ul
我们评估过Pentaho,但这些功能在其昂贵的企业版中提供。
我们目前正在评估Azkaban,因为它支持javaprocess作业类型。但是,Azkaban主要是为Hadoop作业创建的,因此它与Hadoop基础架构,然后是纯javaprocesses更深入的集成。
赞赏开源或低成本解决方案的一些建议。 >
听起来像Apache Airflow()应该满足您的需求,它现在有一个Dataflow运算符()。
We are using Google Dataflow for batch data processing and looking for some options for workflow orchestration tools something similar to what Azkaban does for Hadoop.
Key things things that we are looking for are,
- Configuring workflows
- Scheduling workflows
- Monitoring and alerting failed workflows
- Ability to rerun failed jobs
We have evaluated Pentaho, but these features are available in their Enterprise edition which is expensive.We are currently evaluating Azkaban as it supports javaprocess job types. But Azkaban is primarily created for Hadoop jobs so it has more deep integration with Hadoop infrastructure then plain javaprocesses.
Appreciate some suggestions for opensource or very low cost solutions.
It sounds like Apache Airflow (https://github.com/apache/incubator-airflow) should meet your needs and it now has a Dataflow operator (https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/operators/dataflow_operator.py).
这篇关于Google Dataflow的工作流程编排的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!