Blog

GoDataDriven open source contribution: May 2017 edition

30 May, 2017
Xebia Background Header Wave

A barometric low hung over the Atlantic. It moved east-ward toward a high-pressure area over
Russia without as yet showing any inclination to bypass this high in a northerly direction. The
isotherms and isotheres were functioning as they should. The air temperature was appropriate
relative to the annual mean temperature and to the aperiodic monthly fluctuations of the
temperature. The rising and setting of the sun, the moon, the phases of the moon, of Venus, of
the rings of Saturn, and many other significant phenomena were all in accordance with the
forecasts in the astronomical yearbooks. The water vapor in the air was at its maximal state of
tension, while the humidity was minimal. In a word that characterizes the facts fairly
accurately, even if it is a bit old-fashioned: It was a fine day in August 1913.

Incipit from The Man Without Qualities, by Robert Musil

I could have not written it better than Musil, so I’m just going to leave it here: we had a lot
of fine days in May in the Netherlands.

The cultural parenthesis doesn’t end here though! When I moved from Italy to the Netherlands, I was
surprised that the Epiphany was not on the 12th day of Christmas! Instead it was on the closest
Sunday.

It turns out that the Catholic church, when a minor feast does not coincide with a public
holiday, move the feast to the closest Sunday! In principle this was bad news, as that meant
one public holiday less!

However, later that year, I found out that Ascension day in the Netherlands is
celebrated on a Thursday, as it should be! I got my public holiday back! And as it’s near the end
of the week, you can quite easily imagine how people feel compelled to skip the Friday as well!

All of this to say that with all these events going on, we didn’t contribute too much. But still!

We start with our beloved Divolte Collector: before the nice weather and holidays, Andrew managed to release version 0.5. It’s full of goodies, as saving data to Google Cloud Storage and
whenCommitted() support in the JavaScript API.

Also before the holidays, Niels went crazy with Airflow:

  • In PR 2270 he fixed code that would allow SQL injection;
  • In PR 2269 he was so gentle to add closing() to all connections and cursors as, admit it,
    it’s just bad practice when you don’t do that!
  • In PR 2268 he added support for ms modification time in the FTPHook;
  • In PR 2279 he fixed a breaking change by Pandas 0.2 when using BigQuery;
  • Finally in PR 2266 if fixed the fact that no example connections was present when
    load_example was false.

All this work by Niels exerted a lot of pressure on Fokko, our usual PR sprinter: but even with
that burden he managed to contribute to Airflow with PR 2307: there he enabled Sqoop logging.
Then I:

  • Updated the location for the default Hadoop configuration files in hdfs3 with PR 120;
  • Made the equality in Spark DenseMatrix semantical with PR 176981;
  • Made life easier for Windows users that want to use Neovim with Python, with a Wiki change!

That’s it for this edition! As always, if you have any comments, remarks, or compliments, we’d love
to hear them!

We are hiring


  1. This basically means that if a SparseMatrix equal to a DenseMatrix, in terms of the
    underlying data, it should also be treated as equal from Spark. This was not the case when
    calling the DenseMatrix __eq__ method (but it was the case when calling the SparseMatrix
    __eq__ method!) 
Questions?

Get in touch with us to learn more about the subject and related solutions

Explore related posts