Apache Airflow

Intermediate

Apache Airflow is a platform to design, schedule and monitor workflows. Workflows are modeled as directed acyclic graphs and written as Python code. Airflow is designed to run workflows either on regular schedules or on-demand. Thus, it is not well-suited for tasks that must be triggered in an event-based fashion. Tasks from workflows are run by so-called executors. Most of the executors included with Airflow are designed to interact with a cluster, so that the load is distributed over its worker nodes (e.g. Celery, Dask, or Kubernetes executors).

I have used Apache Airflow to build, manage and oversee the execution of ETL pipelines in the context of data engineering.