How to Install Apache Airflow In Docker Container on EC2 Machine.
In this article, we’ll cover how to install Airflow Docker container on ec2 machine in simple 3 steps.
- Launch any EC2 instance
- Install on that EC2 instance of Docker & Docker-Compose
- Create one yml file (doker-compose.yml) and run start docker container.
Go To AWS console and search EC2 in search bar, and click on Launch instance button.
Launch AMI of Amazon Linux 2 machine add some configuration according to your work load like Instance Type, EBS, VPC , etc.
Once you launch EC2 Instance Successfully, Then u connect that EC2 instance with the help of Putty & .ppk SSH key.
Connected EC2 terminal with the help of Putty , After firstly install docker on EC2 with the help of following Command.
sudo amazon-linux-extras install docker
sudo service docker start
sudo usermod -a -G docker ec2-user
And also install Docker-Compose by using following Command.
sudo curl -L https://github.com/docker/compose/releases/latest/download/docker-compose-$(uname -s | tr '[:upper:]' '[:lower:]')-$(uname -m) -o /usr/bin/docker-compose && sudo chmod 755 /usr/bin/docker-compose && docker-compose --version
Once u install Docker & Docker-Compose then Firstly start Docker service on EC2 .
sudo systemctl status docker
sudo systemctl start docker
sudo systemctl stop docker
sudo systemctl restart docker
Create one docker-compose.yml file for creating airflow docker container. For your reference GitHub code : https://github.com/datainteg/airflow_2.5.3/
version: '3'
x-airflow-common:
&airflow-common
image: ${AIRFLOW_IMAGE_NAME:-apache/airflow:2.5.3}
environment:
&airflow-common-env
AIRFLOW__CORE__EXECUTOR: CeleryExecutor
AIRFLOW__CORE__SQL_ALCHEMY_CONN: postgresql+psycopg2://airflow:airflow@postgres/airflow
AIRFLOW__CELERY__RESULT_BACKEND: db+postgresql://airflow:airflow@postgres/airflow
AIRFLOW__CELERY__BROKER_URL: redis://:@redis:6379/0
AIRFLOW__CORE__FERNET_KEY: ''
AIRFLOW__CORE__DAGS_ARE_PAUSED_AT_CREATION: 'true'
AIRFLOW__CORE__LOAD_EXAMPLES: 'true'
AIRFLOW__API__AUTH_BACKEND: 'airflow.api.auth.backend.basic_auth'
_PIP_ADDITIONAL_REQUIREMENTS: ${_PIP_ADDITIONAL_REQUIREMENTS:-}
volumes:
- ./:/opt/airflow
user: "${AIRFLOW_UID:-50000}:${AIRFLOW_GID:-50000}"
depends_on:
redis:
condition: service_healthy
postgres:
condition: service_healthy
services:
postgres:
image: postgres:latest
environment:
POSTGRES_USER: airflow
POSTGRES_PASSWORD: airflow
POSTGRES_DB: airflow
volumes:
- ./postgres-db-volume:/var/lib/postgresql/data
healthcheck:
test: ["CMD", "pg_isready", "-U", "airflow"]
interval: 5s
retries: 5
restart: always
redis:
image: redis:latest
ports:
- 6379:6379
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 5s
timeout: 30s
retries: 50
restart: always
airflow-webserver:
<<: *airflow-common
command: webserver
ports:
- 8080:8080
restart: always
airflow-scheduler:
<<: *airflow-common
command: scheduler
healthcheck:
test: ["CMD-SHELL", 'airflow jobs check --job-type SchedulerJob --hostname "$${HOSTNAME}"']
interval: 10s
timeout: 10s
retries: 5
restart: always
airflow-worker:
<<: *airflow-common
command: celery worker
healthcheck:
test:
- "CMD-SHELL"
- 'celery --app airflow.executors.celery_executor.app inspect ping -d "celery@$${HOSTNAME}"'
interval: 10s
timeout: 10s
retries: 5
restart: always
airflow-init:
<<: *airflow-common
command: version
environment:
<<: *airflow-common-env
_AIRFLOW_DB_UPGRADE: 'true'
_AIRFLOW_WWW_USER_CREATE: 'true'
_AIRFLOW_WWW_USER_USERNAME: ${_AIRFLOW_WWW_USER_USERNAME:-airflow}
_AIRFLOW_WWW_USER_PASSWORD: ${_AIRFLOW_WWW_USER_PASSWORD:-airflow}
flower:
<<: *airflow-common
command: celery flower
ports:
- 5555:5555
restart: always
volumes:
postgres-db-volume:
After, Initialize the airflow metadata to relational database by using following command.
sudo docker-compose up airflow-init
# if this command not working/stuck , it's menas some permissions error.
# you give all permissions by using following command to that directory
sudo chmod 777 *
# run same command
sudo docker-compose up airflow-init
Once this step are completed then u run following command also for starting all containers and also check container running or not.
sudo docker-compose up -d
# follows command works as show docker container status running or restarting
sudo docker ps
# if container was restarting then u see container logs by using follows command
sudo docker logs container_id
Here, show airflow container was restarting, don’t worry some time this type of error gives, Firstly identify the error by using logs with above command.
This type of Error give, because of permission error give that directory all permissions by using below command
sudo chmod 777 -R /home/ec2-user/airfloworsudo chmod 777 -R *
When you completed all above steps then also required whitelist port 8080 for webserver add this rule in the security group.
Congratulations, you have successfully install docker container of Apache Airflow on AWS EC2 machine.
See the Public IP address.
Webserver URL Public_IP:8080
In my case , http://43.205.237.112:8080 ( username : airflow & password : airflow ) like that
Add , dags script into dags directory and give the 777 permission of that .py file, for example : chmod 777 my_first_dag.py
If something becomes wrong, please tell me so I can update the content and keep this article right as long as possible 🙏
https://medium.com/@akshay03/task-group-of-apache-airflow-in-5-minutes-e4136eb4a2f7
Thank you !!!