Retrospect on Spark Summit 2016
The European Spark Summit took place on October 25-27 in Brussels. Over 1,000 Spark enthusiasts gathered to attend training and listen to keynotes from Matei Zaharia, Ion Stoica, and Andy Steinbach.
This year, GoDataDriven was asked to deliver training and to do a key note presentation. Needless to say, we were honored and took on this opportunity with two hands.
Spark Summit Training Day
On the first day of the Summit, training day, Andrew Snare geared up to explore Wikipedia using Spark and teach the 100 participants in the room a thing or two. Luckily, Andrew was joined by three training assistants, including Kris Geusebroek. The TA's made sure that Andrew could focus on the training, while the TA's took care of any question or remark from the participants.
Kris Geusebroek remarked: "First of all, it was great to meet the people behind Databricks. The training went well, as a TA I did not have to sit still, but that made the effort rewarding in the end. Even more thrilling was the positive feedback of the participants, which, I must say, was a great accomplishment by Andrew as a trainer, and the rest of the TA's".
Spark Summit - The Conference
The first day of the two-day conference had a focus on developers, while the second day had a focus on the enterprise. This separation is part of the general theme that became apparent during these two days: Spark has become part of the core of Big Data and Data Science tooling, and now the focus has shift from what we can do with it, to how we can create value with it.
The two days featured awesome keynotes, including one which featured beer (did you know that The Netherlands now outnumbers Belgium when it comes to breweries) and Max Verstappen in the same talk! Yes, this was the keynote performed by our very own COO, Renald Buter:
Experiences at the Spark Summit
Quite a few consultants from GoDataDriven attended the Spark Summit. The general experience was a very positive one, with lots of information and fresh insights. For Bas Harenslak this was his first conference.
"The developer day was a great learning experience. I followed mostly sessions on testing, monitoring and debugging Spark and learned about useful tricks and tools such as Vegas (Vega visualisation + Scala), SparkLint (monitoring tool for Spark jobs) and Spark profiling with flame graphs", says Bas. "The second day was the enterprise day, although I prefer the developer topics, it was still an interesting day with talks on structured streaming, containerised Spark and of course Renald’s keynote!"
A recurring topic in several talks was the availability of whole-stage codegen in Spark 2.0 for improving execution performance. It would have been good to have more presentations with Structured Streaming as a topic, since it was released recently with Spark 2.0. Besides the technical stuff, the conference was well organised with nice food and drinks.
Jelte Hoekstra attended as well. "Many presentations were mostly focused on first use of Spark, for example migrating to Spark from Hive or a small data set-up. ETL is definitely a vital aspect of data science, but personally, I would say: more distributed machine learning! Perhaps on a next Summit, they could try different formats as an addition to just presentations, that would be nice!"
Follow us for more of this
Testing and debugging Apache Airflow
February 22, 2019
The Zen of Python and Apache Airflow
February 18, 2019
AWS Machine Learning Competency Status for GoDataDriven
February 14, 2019
GoDataDriven Open Source Contribution for January 2019, the Apache Edition
February 13, 2019
Our social responsibility as a company
February 08, 2019
Keras: multi-label classification with ImageDataGenerator
January 31, 2019