本文介绍了及时安排火花作业的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

推荐使用哪种工具来每天/每周调度Spark作业.1)Oozie2)路易吉3)阿兹卡班4)计时5)气流

Which is the recommended tool for scheduling Spark Jobs on a daily/weekly basis.1) Oozie2) Luigi3) Azkaban4) Chronos5) Airflow

谢谢.

推荐答案

从此处更新我以前的答案:

Updating my previous answer from here: Suggestion for scheduling tool(s) for building hadoop based data pipelines

  • 气流:先尝试一下.体面的用户界面,类似于Python的作业定义,对于非程序员来说半可访问,依赖声明语法很奇怪.
    • Airflow内置了对事实的支持,即计划的作业通常需要重新运行和/或回填.确保您建立了支持此功能的管道.
    • Airflow: Try this first. Decent UI, Python-ish job definition, semi-accessible for non-programmers, dependency declaration syntax is weird.
      • Airflow has built in support for the fact that jobs scheduled jobs often need to be rerun and/or backfilled. Make sure you build your pipelines to support this.
      • Azkaban enforces simplicity (can’t use features that don’t exist) and the others subtly encourage complexity.
      • Check out the Azkaban CLI project for programmatic job creation. https://github.com/mtth/azkaban (examples https://github.com/joeharris76/azkaban_examples)

      哲学:

      简单的管道比复杂的管道要好:易于创建,易于理解(尤其是在未创建时),并且易于调试/修复.

      Simpler pipelines are better than complex pipelines: Easier to create, easier to understand (especially when you didn’t create) and easier to debug/fix.

      当需要复杂的操作时,您希望以完全成功或完全失败的方式封装它们.

      When complex actions are needed you want to encapsulate them in a way that either completely succeeds or completely fails.

      如果您可以使其幂等(再次运行它可以产生相同的结果),那就更好了.

      If you can make it idempotent (running it again creates identical results) then that’s even better.

      这篇关于及时安排火花作业的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-04 06:13