GoDataDriven open source contribution: May 2017 edition

A barometric low hung over the Atlantic. It moved east-ward toward a high-pressure area over Russia without as yet showing any inclination to bypass this high in a northerly direction. The isotherms and isotheres were functioning as they should. The air temperature was appropriate relative to the annual mean temperature and to the aperiodic monthly fluctuations of the temperature. The rising and setting of the sun, the moon, the phases of the moon, of Venus, of the rings of Saturn, and many other significant phenomena were all in accordance with the forecasts in the astronomical yearbooks. The water vapor in the air was at its maximal state of tension, while the humidity was minimal. In a word that characterizes the facts fairly accurately, even if it is a bit old-fashioned: It was a fine day in August 1913.

Incipit from The Man Without Qualities, by Robert Musil

I could have not written it better than Musil, so I'm just going to leave it here: we had a lot of fine days in May in the Netherlands.

The cultural parenthesis doesn't end here though! When I moved from Italy to the Netherlands, I was surprised that the Epiphany was not on the 12th day of Christmas! Instead it was on the closest Sunday.

It turns out that the Catholic church, when a minor feast does not coincide with a public holiday, move the feast to the closest Sunday! In principle this was bad news, as that meant one public holiday less!

However, later that year, I found out that Ascension day in the Netherlands is celebrated on a Thursday, as it should be! I got my public holiday back! And as it's near the end of the week, you can quite easily imagine how people feel compelled to skip the Friday as well!

All of this to say that with all these events going on, we didn't contribute too much. But still!

We start with our beloved Divolte Collector: before the nice weather and holidays, Andrew managed to release version 0.5. It's full of goodies, as saving data to Google Cloud Storage and whenCommitted() support in the JavaScript API.

Also before the holidays, Niels went crazy with Airflow:

  • In PR 2270 he fixed code that would allow SQL injection;
  • In PR 2269 he was so gentle to add closing() to all connections and cursors as, admit it, it's just bad practice when you don't do that!
  • In PR 2268 he added support for ms modification time in the FTPHook;
  • In PR 2279 he fixed a breaking change by Pandas 0.2 when using BigQuery;
  • Finally in PR 2266 if fixed the fact that no example connections was present when load_example was false.

All this work by Niels exerted a lot of pressure on Fokko, our usual PR sprinter: but even with that burden he managed to contribute to Airflow with PR 2307: there he enabled Sqoop logging. Then I:

  • Updated the location for the default Hadoop configuration files in hdfs3 with PR 120;
  • Made the equality in Spark DenseMatrix semantical with PR 176981;
  • Made life easier for Windows users that want to use Neovim with Python, with a Wiki change!

That's it for this edition! As always, if you have any comments, remarks, or compliments, we'd love to hear them!

  1. This basically means that if a SparseMatrix equal to a DenseMatrix, in terms of the underlying data, it should also be treated as equal from Spark. This was not the case when calling the DenseMatrix __eq__ method (but it was the case when calling the SparseMatrix __eq__ method!) 

Stay up to date on the latest insights and best-practices by registering for the GoDataDriven newsletter.