本文介绍了如何阻止DAG回填? catchup_by_default = False和catchup = False似乎不起作用,Airflow Scheduler无法回填的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

airflow.cfg中的设置catchup_by_default = False似乎不起作用。另外,向DAG中添加catchup = False也不起作用。

The setting catchup_by_default=False in airflow.cfg does not seem to work. Also adding catchup=False to the DAG doesn't work neither.

这里是重现问题的方法。我总是从运行 airflow resetdb 开始。取消暂停后,任务便开始回填。

Here's how to reproduce the issue. I always start from a clean slate by running airflow resetdb. As soon as I unpause the dag, the tasks start to backfill.

以下是该设置。我只是使用。

Here's the setup for the dag. I'm just using the tutorial example.

default_args = {
    "owner": "airflow",
    "depends_on_past": False,
    "start_date": datetime(2018, 9, 16),
    "email": ["[email protected]"],
    "email_on_failure": False,
    "email_on_retry": False,
    "retries": 1,
    "retry_delay": timedelta(minutes=5),
}

dag = DAG("tutorial", default_args=default_args, schedule_interval=timedelta(1), catchup=False)


推荐答案

就像@dlamblin一样,并且在也是。Airflow会为最近的有效间隔创建一个DagRun。 catchup = False 将指示调度程序仅为DAG间隔系列的最新实例创建DAG运行。

Like @dlamblin mentioned and as mentioned in the docs too Airflow would create a single DagRun for the most recent valid interval. catchup=False will instruct the scheduler to only create a DAG Run for the most current instance of the DAG interval series.

虽然在使用时有一个 timedelta 表示 schedule_interval ,而不是CRON表达式或CRON预设。这已在Airflow Master中通过。我们将通过此修复程序发布Airflow 1.10.11。

Although there was a BUG when using a timedelta for schedule_interval instead of a CRON expression or CRON preset. This has been fixed in Airflow Master with https://github.com/apache/airflow/pull/8776. We will release Airflow 1.10.11 with this fix.

这篇关于如何阻止DAG回填? catchup_by_default = False和catchup = False似乎不起作用,Airflow Scheduler无法回填的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-26 22:03