GoDataDriven open source contribution: July 2018 edition
Welcome to the Open Source at GoDataDriven, July 2018 edition.
We start with Tünde and Kris who did a phenomenal job to add support to for Hive partitioned tables with partitions having different data formats in Spark. You can find the result of their work in PR 21893. Their work is the result of three sessions they had during our GoDataDriven Fridays. It involved determination, skills, and a bit of detective work throughout the Spark code base (they touched 7 files, adding more than 500 lines of code at the end).
The Spark folks are however reluctant to merge it. If you also think the feature is important and useful to you, let your voice be heard!
To close: I contributed PR 270 and 286 to dask-ml — although the first one might never be
merged, even though it solves an open issue. Both PRs show a nice use of decorators, with the
latest one also showing how to define context managers with
That's it for this edition! Don't forget we're hiring! Especially if you are a software engineer that would like to move in the data space, get in touch as we're offering an apprenticeship starting from October.
And if you want more rambling throughout the month, follow me on Twitter: I'm gglanzani there!
The magic is used to make it easier to load the solution of the exercises without much replication. ↩
Follow us for more of this
How to build your first image classifier using PyTorch
July 18, 2019
Data Science Podcast Recommendations
July 12, 2019
The Analytics Translator Part 3: Characteristics of an Analytics Translator
July 10, 2019
The Analytics Translator Part 2: The Problems an Analytics Translator Solves
July 03, 2019
GoDataDriven Open Source Contribution for May and June 2019
June 28, 2019
Deploying Apache Airflow on Azure Kubernetes Service
June 28, 2019