How to Install Apache Airflow In Docker Container on EC2 Machine.

Akshay Thakare
5 min readMay 30, 2023

--

apache airflow

In this article, we’ll cover how to install Airflow Docker container on ec2 machine in simple 3 steps.

  1. Launch any EC2 instance
  2. Install on that EC2 instance of Docker & Docker-Compose
  3. Create one yml file (doker-compose.yml) and run start docker container.

Go To AWS console and search EC2 in search bar, and click on Launch instance button.

AWS Console
EC2 Dashboard

Launch AMI of Amazon Linux 2 machine add some configuration according to your work load like Instance Type, EBS, VPC , etc.

EC2 Instance Launch interface
EC2 Instance on Running State

Once you launch EC2 Instance Successfully, Then u connect that EC2 instance with the help of Putty & .ppk SSH key.

Putty

Connected EC2 terminal with the help of Putty , After firstly install docker on EC2 with the help of following Command.

sudo amazon-linux-extras install docker
sudo service docker start
sudo usermod -a -G docker ec2-user

And also install Docker-Compose by using following Command.

sudo curl -L https://github.com/docker/compose/releases/latest/download/docker-compose-$(uname -s | tr '[:upper:]' '[:lower:]')-$(uname -m) -o /usr/bin/docker-compose && sudo chmod 755 /usr/bin/docker-compose && docker-compose --version

Once u install Docker & Docker-Compose then Firstly start Docker service on EC2 .

sudo systemctl status docker
sudo systemctl start docker
sudo systemctl stop docker
sudo systemctl restart docker

Create one docker-compose.yml file for creating airflow docker container. For your reference GitHub code : https://github.com/datainteg/airflow_2.5.3/

version: '3'
x-airflow-common:
&airflow-common
image: ${AIRFLOW_IMAGE_NAME:-apache/airflow:2.5.3}
environment:
&airflow-common-env
AIRFLOW__CORE__EXECUTOR: CeleryExecutor
AIRFLOW__CORE__SQL_ALCHEMY_CONN: postgresql+psycopg2://airflow:airflow@postgres/airflow
AIRFLOW__CELERY__RESULT_BACKEND: db+postgresql://airflow:airflow@postgres/airflow
AIRFLOW__CELERY__BROKER_URL: redis://:@redis:6379/0
AIRFLOW__CORE__FERNET_KEY: ''
AIRFLOW__CORE__DAGS_ARE_PAUSED_AT_CREATION: 'true'
AIRFLOW__CORE__LOAD_EXAMPLES: 'true'
AIRFLOW__API__AUTH_BACKEND: 'airflow.api.auth.backend.basic_auth'
_PIP_ADDITIONAL_REQUIREMENTS: ${_PIP_ADDITIONAL_REQUIREMENTS:-}
volumes:
- ./:/opt/airflow
user: "${AIRFLOW_UID:-50000}:${AIRFLOW_GID:-50000}"
depends_on:
redis:
condition: service_healthy
postgres:
condition: service_healthy

services:
postgres:
image: postgres:latest
environment:
POSTGRES_USER: airflow
POSTGRES_PASSWORD: airflow
POSTGRES_DB: airflow
volumes:
- ./postgres-db-volume:/var/lib/postgresql/data
healthcheck:
test: ["CMD", "pg_isready", "-U", "airflow"]
interval: 5s
retries: 5
restart: always

redis:
image: redis:latest
ports:
- 6379:6379
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 5s
timeout: 30s
retries: 50
restart: always

airflow-webserver:
<<: *airflow-common
command: webserver
ports:
- 8080:8080
restart: always

airflow-scheduler:
<<: *airflow-common
command: scheduler
healthcheck:
test: ["CMD-SHELL", 'airflow jobs check --job-type SchedulerJob --hostname "$${HOSTNAME}"']
interval: 10s
timeout: 10s
retries: 5
restart: always

airflow-worker:
<<: *airflow-common
command: celery worker
healthcheck:
test:
- "CMD-SHELL"
- 'celery --app airflow.executors.celery_executor.app inspect ping -d "celery@$${HOSTNAME}"'
interval: 10s
timeout: 10s
retries: 5
restart: always

airflow-init:
<<: *airflow-common
command: version
environment:
<<: *airflow-common-env
_AIRFLOW_DB_UPGRADE: 'true'
_AIRFLOW_WWW_USER_CREATE: 'true'
_AIRFLOW_WWW_USER_USERNAME: ${_AIRFLOW_WWW_USER_USERNAME:-airflow}
_AIRFLOW_WWW_USER_PASSWORD: ${_AIRFLOW_WWW_USER_PASSWORD:-airflow}

flower:
<<: *airflow-common
command: celery flower
ports:
- 5555:5555
restart: always

volumes:
postgres-db-volume:
Docker Service Start

After, Initialize the airflow metadata to relational database by using following command.

sudo docker-compose up airflow-init

# if this command not working/stuck , it's menas some permissions error.
# you give all permissions by using following command to that directory

sudo chmod 777 *

# run same command
sudo docker-compose up airflow-init
Airflow Metadata Initialization

Once this step are completed then u run following command also for starting all containers and also check container running or not.

sudo docker-compose up -d

# follows command works as show docker container status running or restarting
sudo docker ps

# if container was restarting then u see container logs by using follows command
sudo docker logs container_id
output : docker ps

Here, show airflow container was restarting, don’t worry some time this type of error gives, Firstly identify the error by using logs with above command.

Excepted Error

This type of Error give, because of permission error give that directory all permissions by using below command

sudo chmod 777 -R /home/ec2-user/airfloworsudo chmod 777 -R *
Running Container list

When you completed all above steps then also required whitelist port 8080 for webserver add this rule in the security group.

whitelisting port 8080 for airflow webserver

Congratulations, you have successfully install docker container of Apache Airflow on AWS EC2 machine.

See the Public IP address.

Webserver URL Public_IP:8080

In my case , http://43.205.237.112:8080 ( username : airflow & password : airflow ) like that

Login page of airflow webserver
airflow webserver

Add , dags script into dags directory and give the 777 permission of that .py file, for example : chmod 777 my_first_dag.py

If something becomes wrong, please tell me so I can update the content and keep this article right as long as possible 🙏

https://medium.com/@akshay03/task-group-of-apache-airflow-in-5-minutes-e4136eb4a2f7

Thank you !!!

--

--

Responses (1)