问题描述
有人请告诉我,气流中的DAG是否只是一个图(如占位符),而没有与之关联的任何实际数据(如自变量),还是DAG像实例(针对固定参数)?
Someone please tell me whether a DAG in airflow is just a graph (like a placeholder) without any actual data (like arguments) associated with it OR a DAG is like an instance (for a fixed argument)?
我想要一个系统,其中要执行的一组操作是固定的(给定一组参数)。但是,每次运行该组操作时,此输入都会有所不同。简单来说,管道是相同的,但是每次运行时管道的参数都会不同。
I want a system where the set of operations to perform (given a set of arguments) are fixed. But this input will be different everytime the set of operations are run. In simple terms, the pipeline is the same but the arguments to the pipeline will be different everytime it is run.
我想知道如何在气流中进行配置?是否应该为每个新的参数集创建一个新的DAG?还是其他任何方法?
I want to know how to configure this in airflow? Should I create a new DAG for every new set of arguments? or any other method?
在我的情况下,图形是相同的,但是要在它们来时对(来自不同用户的)不同数据运行它。因此,我应该每次为新数据创建一个新的DAG吗?
In my case, the graph is the same but want to run it on different data (from different users) as they come. So, should I create a new DAG everytime for new data?
推荐答案
您不需要每次都创建一个新的DAG,如果图的结构相同。
You do not need to create a new DAG every time, if the structure of the graph is the same.
气流DAG是通过代码创建的,因此您可以自由创建一个允许每次传递参数的代码结构。如何做到这一点将需要一些创造性的思考。
Airflow DAGs are created via code, so you are free to create a code structure that allows you to pass in arguments each time. How you do that will require some creative thinking.
例如,您可以创建一个接受参数的Web表单,将其存储在数据库中,然后使用Airflow restAPI调度DAG。然后需要编写DAG代码以从数据库中检索参数。
You could, for example, create a web form that accepts the arguments, stores them in a DB and then schedules the DAG with the Airflow restAPI. The DAG code would then need to be written to retrieve params from the database.
还有其他几种方法可以满足您的要求,它们都取决于您的用例。
一个警告,如果您更改DAG的开始日期,Airflow计划程序的性能将不佳。对于上述想法,您需要将开始日期设置为比首次运行DAG早,然后将计划间隔设置为关闭。这样,您的开始日期就不会改变,并且会动态触发DAG运行。
There are several other ways to accomplish what you are asking, they all just depend on your use case.One caveat, the Airflow scheduler does not perform well if you change the start date of the DAG. For your idea above you will need to set the start date earlier than your first DAG run and then set the schedule interval to off. This way you have a start date that doesn’t change and dynamically triggered DAG runs.
这篇关于气流动态创建的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!