Open Sourcing Airflow Local Development
At GoDataDriven we love open source software. Besides stimulating the use of open source software, we also love to give back to the community. Therefore the Open Source Initiatives have been introduced in 2019. Anyone in the team may propose an idea for an open source contribution, be it to an existing project or creating something new. We pitch those to the rest of the team and people vote for their favorite initiative.
In the first quarter of 2019 a couple of initiatives made their start, including ours. Our proposal was to make it easy to run and develop Airflow workflows on your local machine. This gives you fast feedback about wether the changes you made to your DAG are ok. Think of it as your integration test environment for developing Airflow DAGs.
We combined our experience from doing something similar at various clients to come up with a generic solution, which we dubbed whirl.
The idea of whirl is pretty simple: use Docker containers to start up Apache Airflow and the other components used in your workflow. This gives you a copy of your production environment that runs on your local machine. This allows you to run your DAG locally from start to finish - with the same code as it would on production. Being able to see your pipeline succeed gives you more confidence about the logic you are creating/refactoring and the integration between the different components you are facing. An additional benefit is that it gives (new) developers an isolated environment to experiment with your workflows.
whirl connects the code of your DAG and your (mock) data to the Apache Airflow container that it spins up. By using volume mounts, you are able to make changes to your code in your favorite IDE and immediately see the effect in the running Apache Airflow UI on your machine. This even works with custom Python modules that you are developing and using in your DAGs.
You can find the project on our github: https://github.com/godatadriven/whirl For instructions on how to use whirl please have a look at the README of the project.
Having a week to specifically focus on this project helped us to open source our idea in a relatively short period of time. By combining ideas from several projects, we think we've come to an even better project structure. We will be using whirl at new clients. Hopefully it will be of use to other Airflow developers out there as well.
By the way, we offer an Airflow course to teach you the internals, terminology, and best practices of working with Airflow, with hands-on experience in writing an maintaining data pipelines.
Follow us for more of this
Are sklearn defaults wrong?
September 03, 2019
Improved wireless coverage using an old router
August 28, 2019
Data Driven Board Game Design
August 23, 2019
Real time analytics: Divolte + Kafka + Druid + Superset
August 22, 2019
DeepCS - Berlin Buzzwords 2019
July 26, 2019
Fairness in AI - Dutch Data Science Week 2019
July 23, 2019