Inside the Dragon’s Den; Link Data Science Summer School
Back in 2014, one of the striking findings of the Dutch National Think Tank was the growing demand for data scientists. In order to work towards a solution, together with a number of commercial companies, LINK Data Science introduced the Data Science Summer School, a two-week programme focussing on the introduction of high-performing students to the fascinating world of data science.
Youssef El Bouhassani, one of the the founders of the LINK-Summer School, explains: "Data Science is learnt by working on real challenges faced by organisations. And that's what the LINK-Summer School is all about. The organisations provided a real business cases which they started working on recently and that is exciting both for the participants and for the partners".
From the start, GoDataDriven has been one of the companies to support this initiative.
The Data Science Summer School
Fifteen students with a keen interest in Data Science were selected to participate in the two-week program of the Summer School. During the first week, various data science experts, including GoDataDriven’s Vincent Warmerdam, coached and trained the students by sharing their experiences and know-how from the data science field.
In the second week, the participants got to apply their new found insights to work ‘for social good’ on three cases, submitted by NedTrain, Eneco and NS. Providing concrete solutions for these business cases would mean valuable improvements for the participating companies. On August 14th, in a Dragon’s Den setting, the Summer School concluded with the highly anticpated presentation of the three business cases. The Dragon’s to judge the presentations came from Eneco, Nedtrain and the board of the National Think Tank self.
Presentation of business cases NedTrain, Eneco, NS
The three business cases had different scopes. The first presentation of the NedTrain case aimed at predicting service requests based on sensor data for the Dutch Sprinter Light Train. Of all trains that are not available for service, 70% has corrective maintenance (instead of scheduled maintenance). Besides the damage to the public image, total cost of this unscheduled maintenance is millions of Euros a year.
The second group went to work with the Eneco case and focused on predicting the output of a solar power panel on any given day. By combining satellite data of the weather, data and production data the team tried to predict the power production of solar panels.
The third and last case came from NS Stations and focused on the question: What is the best way to arrange a shop plan to optimise transactions and shopping experience in a train station. Based on sensor data from various places in the station, the group looked at the type of traveller to visit shops, visitor flow, duration and time of day.
All teams did very well and did their best to put the theory into practice. They found out that preparing the data is very time-consuming and that the right interpretation of data is important. For example a train that is in service suddenly had hardly any sensor readings, which was caused by the train standing still. Or solar panels that use inverters with a low efficiency and a maximum capacitiy, caused discrepancies in data because they were not able to transmit all generated power. And what to do with sensor readings from a shop in an NS Station in the middle of the night when all shops are closed?
'My main objective was to spark the enthusiasm of the participants for data science and if I have been able to share my passion for open source as well, that would be a great extra!’ Vincent Warmerdam (GoDataDriven)
All groups made a lot of progress with their projects and were able to provide recommendations, but were not able to create a predictive model yet. More research, interpretation and experimentation needs to be done to convert the insights into predictive models. Based on the presentations the jury decided that team NS Stations was the winner of this year's Data Science Summer School.
We are pretty sure that the story of all participating students and the business cases will be continued!
Follow us for more of this
Testing and debugging Apache Airflow
February 22, 2019
The Zen of Python and Apache Airflow
February 18, 2019
AWS Machine Learning Competency Status for GoDataDriven
February 14, 2019
GoDataDriven Open Source Contribution for January 2019, the Apache Edition
February 13, 2019
Our social responsibility as a company
February 08, 2019
Keras: multi-label classification with ImageDataGenerator
January 31, 2019