Go Data Driven BLOG!

How to style transfer your own images

15 Mar

The term "style transfer" is used to describe the operation of recomposing one image in the style of another. In this blog, we demonstrate two approaches on how to do this yourself: neural style and cycle-consistent adverserial networks.

Read more...


It's time to trust your predictions

05 Mar

When you find yourself building a prediction machine where you are both looking for the best model and a fair estimate of its performance this blog is for you. Especially so when you are working with time series data.

Read more...


Testing and debugging Apache Airflow

22 Feb

One of the questions I get asked the most about Apache Airflow is how to shorten the development cycle of pushing code, deploying, and manually triggering a DAG for verification to something that is locally testable without running on a live system. In this blog post I provide several pointers to testing and debugging Apache Airflow on your local machine.

Read more...


The Zen of Python and Apache Airflow

18 Feb

Apache Airflow is a Python framework for programmatically creating workflows in DAGs. This allows for concise and flexible scripts but can also be the downside of Airflow; since it's Python code there are infinite ways to define your pipelines. The Zen of Python is a list of 19 Python design principles and in this blog post I point out some of these principles on four Airflow examples.

Read more...


AWS Machine Learning Competency Status for GoDataDriven

14 Feb

AWS Machine Learning Competency Status for GoDataDriven

Read more...


GoDataDriven Open Source Contribution for January 2019, the Apache Edition

13 Feb

Apache edition? Yes, we did a ton of work for Apache related projects. And by we I mean…

Read more...


Our social responsibility as a company

08 Feb

With an ever increasing number of scandals around the practices of data driven companies, where does GoDataDriven stand?

Read more...


Keras: multi-label classification with ImageDataGenerator

31 Jan

Multi-label classification is a useful functionality of deep neural networks. I recently added this functionality into Keras' `ImageDataGenerator` in order to train on data that does not fit into memory. This blog post shows the functionality and runs over a complete example using the VOC2012 dataset.

Read more...


[Podcast] Data Science Challenges for Non-Tech Companies

29 Jan

In this podcast, Giovanni Lanzani addresses data science challenges for non-tech companies and how to overcome them.

Read more...


Spark surprises for the uninitiated

28 Jan

Recently I was delivering a Spark course. One of the exercises asked the students to split a Spark DataFrame in two, non-overlapping, parts.

Read more...