Open Sourcing Airflow Local Development
At GoDataDriven we love open source software. Besides stimulating the use of open source software, we also love to give back to the community. Therefore the Open Source Initiatives have been introduced in 2019. Anyone in the team may propose an idea for an open source contribution, be it to an existing project or creating something new. We pitch those to the rest of the team and people vote for their favorite initiative.
In the first quarter of 2019 a couple of initiatives made their start, including ours. Our proposal was to make it easy to run and develop Airflow workflows on your local machine. This gives you fast feedback about wether the changes you made to your DAG are ok. Think of it as your integration test environment for developing Airflow DAGs.
We combined our experience from doing something similar at various clients to come up with a generic solution, which we dubbed whirl.
The idea of whirl is pretty simple: use Docker containers to start up Apache Airflow and the other components used in your workflow. This gives you a copy of your production environment that runs on your local machine. This allows you to run your DAG locally from start to finish - with the same code as it would on production. Being able to see your pipeline succeed gives you more confidence about the logic you are creating/refactoring and the integration between the different components you are facing. An additional benefit is that it gives (new) developers an isolated environment to experiment with your workflows.
whirl connects the code of your DAG and your (mock) data to the Apache Airflow container that it spins up. By using volume mounts, you are able to make changes to your code in your favorite IDE and immediately see the effect in the running Apache Airflow UI on your machine. This even works with custom Python modules that you are developing and using in your DAGs.
You can find the project on our github: https://github.com/godatadriven/whirl For instructions on how to use whirl please have a look at the README of the project.
Having a week to specifically focus on this project helped us to open source our idea in a relatively short period of time. By combining ideas from several projects, we think we've come to an even better project structure. We will be using whirl at new clients. Hopefully it will be of use to other Airflow developers out there as well.
Follow us for more of this
Highlights from the new Apache Avro 1.9.0 release
May 14, 2019
May 12, 2019
GoDataDriven Open Source Contribution for March and April 2019
May 03, 2019
Dutch Data Science Week 2019
April 25, 2019
Migrating your Hadoop workloads to the Cloud
April 22, 2019
GoDataDriven announces Data Council NL community
April 19, 2019