Part 5: Airflow Architecture

Jyoti Sachdeva
2 min readJul 23, 2021

Hi,

In this blog, we will discuss about airflow architecture.

The main components of airflow are:

Web Server: User interacts with web server. It is the UI of Airflow, that can be used to get an overview of DAG and tasks, states.

The Web Server has ability to manage users, roles and different configurations.

Webserver takes all the information to be shown on UI from metadata database.

Also, any action performed from UI such as new user creation updates the metadata database.

Scheduler: Scheduler keeps on polling the dags folder and whenever it finds new dag, it updates the metadata database. Also, it checks from metadata database and schedules the dags to be run, which is then picked and queued to executor. It updates the state in metadata to be scheduled and UI polls metadata and we see the updated states.

Executor: The Executor is a message queuing process that is bound to the Scheduler. It updates the state in metadata database (running/failure/success). If it comes in retry state, it will be picked by scheduler again using retry interval.

Executor determines the worker processes that actually execute each scheduled task.

There are different types of Executors which we will discuss in later parts.

Workers: The processes that actually execute the logic of tasks, and are determined by the Executor being used.

Metadata Database: used by the scheduler, executor and webserver to store state. It stores all information of tasks, dags, states, pools, users and so on.

Database updates are performed using an abstraction layer implemented in SQLAlchemy. This abstraction layer separates the function of the remaining components of Airflow from the database.

Thank you for reading.

--

--