GoDataDriven Open Source Contribution for Q3 2019

In the third quarter of 2019, the GDD team has contributed to no fewer than 15 different open source projects:

Various Projects

Rens contributed documentation to voila (#229). Vincent made a modest improvement to the documentation of Sense2Vec (#72). Tim contributed Growatt support to the Home Assistant Community Store (#507). Fokko improved the logging in Apache Flink (#9493), updated dependencies in Apache Spark (#25432, #25437, #25451) to patch some security issues. Apache Iceberg (Incubating) (#488, #489), initially started at Netflix, and now incubating into the Apache Softare Foundation. Furthermore, small improvements on Airlift (#186), Presto SQL (#1603), Apache Parquet (#674), and resurrected the build for MySQL Replicator in #43. Kris reduced the docker footprint of Whirl by dropping an unneeded JDK dependency #54 and improved documentation of his docker-kafka image (#12).

Evol

Rogier contributed to Evol the pull requests #112, #128, #129, #137, #138, #143, #144, #145 and #146. These are primarily bugfixes and cleanups that made it into the 0.5.1 release.

Scruid

Bas Beelen and Barend worked together to add authentication (#74) support to the Scruid project, and Bas added a cool new logo (#68). In the meantime, Fokko updated the testing harness to use the latest version of Druid (#69). Barend improved the exception handling of unexpected HTTP status codes (#67, #70). Fokko added a missing test case (#71)

Java IBAN

Barend published versions 1.6.0 and 1.6.1 of the Java IBAN project into Maven Central, adding twelve new IBAN patterns, clarifying the use of reference data, scrubbing potentially sensitive information from the exception messages and adding some minor features to the API.

Apache Airflow

We have contributed to Airflow and the Airflow ecosystem. Bas Harenslak and Fokko traditionally take the lead here. The default behaviour for XCom's changes; command output is now discarded by default, where it used to be pushed as an XCom by default (#5779). Anyone using Airflow to coordinate Spark jobs should cheer.

Apache Avro

Fokko shepherd the release of Apache Avro 1.9.1 release. The 1.9.1 was released quickly after 1.9.0 because of the discovery of a regression bug. If you're still on the Avro 1.8 branch, it is highly recommended to move to version 1.9.1. An overview of the changes can be found in a seperate blogpost. Not a lot of functionality has been added, but it bumped a lot of the dependencies of Avro which contained CVE's. Also, the dependency on Joda-Time has been removed (#631). Pull requests: #613, #623, #624, #626, #627, #629, #630, #631, #632, #633, #634, #635.

Apache Druid (Incubating)

Fokko was accepted as committer to the Apache Druid project! Wasting no time, he took care of some version updates #8292, #8294, #8404, #8405, #8406, #8407 and general improvements #8234, #8235, #8340.

Scikit-Lego

Rens added repeating basis functions to scikit-lego (#171). Vincent added pulls #162, #164, #167, #168 and #170 which are mostly housekeeping, and reviewed #156 which adds a cool FairClassifier. The 0.3.0 release contains these changes.

Join Us!

Are you a Data Engineer or Data Scientist who cares about open source, we're hiring!

Author
Follow us for more of this
Recent posts
Recent tweets
Stay up to date on the latest insights and best-practices by registering for the GoDataDriven newsletter.
Follow us for more of this