Go Data Driven BLOG!

Welcome to the Go Data Driven BLOG.

This is the place where we share our knowledge and opinions. We will try to post new content regularly.
Enjoy, the GoDataDriven team.

Testing and debugging Apache Airflow

22 Feb

One of the questions I get asked the most about Apache Airflow is how to shorten the development cycle of pushing code, deploying, and manually triggering a DAG for verification to something that is locally testable without running on a live system. In this blog post I provide several pointers to testing and debugging Apache Airflow on your local machine.

Read more...


The Zen of Python and Apache Airflow

18 Feb

Apache Airflow is a Python framework for programmatically creating workflows in DAGs. This allows for concise and flexible scripts but can also be the downside of Airflow; since it's Python code there are infinite ways to define your pipelines. The Zen of Python is a list of 19 Python design principles and in this blog post I point out some of these principles on four Airflow examples.

Read more...


AWS Machine Learning Competency Status for GoDataDriven

14 Feb

AWS Machine Learning Competency Status for GoDataDriven

Read more...


GoDataDriven Open Source Contribution for January 2019, the Apache Edition

13 Feb

Apache edition? Yes, we did a ton of work for Apache related projects. And by we I mean…

Read more...


Our social responsibility as a company

08 Feb

With an ever increasing number of scandals around the practices of data driven companies, where does GoDataDriven stand?

Read more...


Keras: multi-label classification with ImageDataGenerator

31 Jan

Multi-label classification is a useful functionality of deep neural networks. I recently added this functionality into Keras' `ImageDataGenerator` in order to train on data that does not fit into memory. This blog post shows the functionality and runs over a complete example using the VOC2012 dataset.

Read more...


[Podcast] Data Science Challenges for Non-Tech Companies

29 Jan

In this podcast, Giovanni Lanzani addresses data science challenges for non-tech companies and how to overcome them.

Read more...


Spark surprises for the uninitiated

28 Jan

Recently I was delivering a Spark course. One of the exercises asked the students to split a Spark DataFrame in two, non-overlapping, parts.

Read more...


Highlights from the new Apache Airflow 1.10.2 release

23 Jan

Apache Airflow 1.10.2 is released and we highlight some of the most interesting features.

Read more...


Turning off our Ethereum miner

23 Jan

1,5 years ago we build our very own Ethereum miner, today we turned it off. What did we learn?

Read more...