问题描述
如果我们在github存储库帐户中维护代码/脚本,是否有任何方法可以从Github存储库复制这些脚本并在其他群集(可以是Hadoop或Spark)上执行。
If we maintain our code/scripts in github repository account, is there any way to copy these scripts from Github repository and execute on some other cluster ( which can be Hadoop or Spark).
气流是否可以提供任何操作员连接到Github来获取此类文件?
Does airflow provides any operator to connect to Github for fetching such files ?
在Github中维护脚本将提供更大的灵活性,因为代码中的每次更改都会
Maintaining scripts in Github will provide more flexibility as every change in the code will be reflected and used directly from there.
这种情况下的任何想法都是有帮助的。
Any idea on this scenario will really help.
推荐答案
您可以在PythonOperator任务的一部分中使用来运行
You can use GitPython as part of a PythonOperator task to run the pull as per a specified schedule.
import git
g = git.cmd.Git( git_dir )
g.pull()
别忘了确保已添加相关密钥,以便气流工作人员有权提取数据。
Don't forget to make sure that you have added the relevant keys so that the airflow workers have permission to pull the data.
这篇关于如何将Airflow与Github集成以运行脚本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!