Blog

Open Sourcing Airflow Local Development

05 Apr, 2019
Xebia Background Header Wave

At GoDataDriven we love open source software. Besides stimulating the use of open source software, we also love to give back to the community. Therefore the Open Source Initiatives have been introduced in 2019. Anyone in the team may propose an idea for an open source contribution, be it to an existing project or creating something new. We pitch those to the rest of the team and people vote for their favorite initiative.

In the first quarter of 2019 a couple of initiatives made their start, including ours. Our proposal was to make it easy to run and develop Airflow workflows on your local machine. This gives you fast feedback about wether the changes you made to your DAG are ok. Think of it as your integration test environment for developing Airflow DAGs.

We combined our experience from doing something similar at various clients to come up with a generic solution, which we dubbed whirl.

Introducing Whirl

The idea of whirl is pretty simple: use Docker containers to start up Apache Airflow and the other components used in your workflow. This gives you a copy of your production environment that runs on your local machine. This allows you to run your DAG locally from start to finish – with the same code as it would on production. Being able to see your pipeline succeed gives you more confidence about the logic you are creating/refactoring and the integration between the different components you are facing. An additional benefit is that it gives (new) developers an isolated environment to experiment with your workflows.

whirl connects the code of your DAG and your (mock) data to the Apache Airflow container that it spins up. By using volume mounts, you are able to make changes to your code in your favorite IDE and immediately see the effect in the running Apache Airflow UI on your machine. This even works with custom Python modules that you are developing and using in your DAGs.

You can find the project on our github: https://github.com/godatadriven/whirl
For instructions on how to use whirl please have a look at the README of the project.

Concluding

Having a week to specifically focus on this project helped us to open source our idea in a relatively short period of time. By combining ideas from several projects, we think we’ve come to an even better project structure. We will be using whirl at new clients. Hopefully it will be of use to other Airflow developers out there as well.

Want to get to know the inner workings of Apach Airflow? We teach a course
with everything on the internals, terminology, and best practices of working with Airflow, with
hands-on experience in writing an maintaining data pipelines.

Questions?

Get in touch with us to learn more about the subject and related solutions

Explore related posts