Airflow成功从S3写入和读取

Airflow成功从S3写入和读取

本文介绍了Airflow成功从S3写入和读取,但不会在docker上加载S3日志的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在docker-compose-LocalExecutor中使用puckle的气流docker(



我的问题是:



如果我运行 docker-compose down docker-compose up UI出现,因为dag从未运行过(UI不会加载dag的远程日志):



解决方案

原来,并不是从日志文件中重建了运行的事件。所有元数据(运行事件,用户,xconn等)都存储在外部数据库中。
一旦配置了与外部数据库的连接,一切就可以正常工作!


I'm using puckle's airflow docker (github link) with docker-compose-LocalExecutor. The project is deployed through CI/CD on EC2 instance so my airflow doesn't run on a persistent server. (Every push on master it gets launched afresh). I know i'm losing some great features but in my setup everything is configured by bash script and/or enviroment variables.My setup is similiar to this answer setup: Similar setup answer

I'm running on version 1.10.6, so the old method of adding config/__init__.py and config/log_class.py is not needed anymore.

Changes I made on the original repository code:

  1. I added some enviroment variables and changed the build mode on docker-compose -f docker-compose-LocalExecutor to write/save logs on S3 and build from local Dockerfile:

    webserver:
        build: .
        environment:
            - AIRFLOW__CORE__REMOTE_LOGGING=True
            - AIRFLOW__CORE__REMOTE_LOG_CONN_ID=aws_default
            - AIRFLOW__CORE__REMOTE_BASE_LOG_FOLDER=s3://path/to/logs
            - AIRFLOW_CONN_AWS_DEFAULT=s3://key:password
    

  2. I changed Dockerfile on line 59 to install s3 plugin as showed below:

    && pip install apache-airflow[s3,crypto,celery,password,postgres,hive,jdbc,mysql,ssh${AIRFLOW_DEPS:+,}${AIRFLOW_DEPS}]==${AIRFLOW_VERSION} \
    

Those configurations works fine, the logs are written and read successfully from S3 as showed below:

My problem is:

If I run docker-compose down and docker-compose up the UI appear as the dags have never run before (The UI won't load the dag's remote logs):

解决方案

Turns out that runned events are not rebuilded from logs files. All the metadata (running events, users, xconn, etc) are stored in a external DataBase.Once a configured a connection to a external database everything worked fine!

这篇关于Airflow成功从S3写入和读取,但不会在docker上加载S3日志的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-28 13:13