I don't have to tell you Edinburgh is all about history. In contrast, between the 23rd and 29th of July more than 1300 Python-enthusiasts gathered to talk about the future.
The 3 Most Interesting Questions
A great keynote from dabeaz (David Beazley) talking about why threads have to die and posing the question above.
David likes to re-envision threads and he showed his thredo library. When you want to cancel a thread, e.g. because it takes too long to finish or seems to be sleeping, it should of course die nicely and give back any acquired locks. So threads: watch out for dabeaz!
Anonymization of your data is not a binary process. We like to talk about data that is anonymized, but more than often it is only anonymized to a certain extent. Andreas Dewes and Katharine Jarmul pointed out that the better we want to safeguard personal details the more we have to sacrifice by throwing away (part of our) data.
In their workshop I learned about different levels of anonymization, such as pseudonymization, K-anonymity, differential privacy, and how you can implement it. Definitely worth a look!
With a parallel to the world of arts Daniele Procida (Django core developer) explained what makes a craftsman. As opposed to a sophisticated professional you have the naive one.
Where a naive programmer obviously lacks certain qualities, his work can still be valuable. For one it can still solve the task at hand. Additionally it may show some unintended creativity, since the programmer was not aware of or limited by all the rules and best practices of the community.
With programmers becoming more ubiquitous in our society also will the naive ones. We have a choice to either laugh at and think less of them, or learn from their fresh perspective and let them inspire us.
Useless but Fun
Ridiculously Advanced Python, or in the words of the trainer:
“Stuff that will get you fired.”
Francesco Pierfederici took us on a trip to use more and more advanced concepts such as decorators, descriptors and meta-classes to shorten our code, but also making it increasingly more difficult to understand for a fresh pair of eyes.
Least Related to Python
A video-connection with Concordia Station on Antarctica was established. Unfortunately, they only had a 512kb/s internet connection and the audio quality was stuttering quite a bit. The parts that had good audio were actually interesting. A few people there told about their research, e.g. how they drill into a three km deep ice layer to analyze air bubbles that got trapped in the ice over one million years ago. And yes, there was also a mention of Python:
“We use it for most of our programming tasks.”
Three Times Text
Attracted not only by the title, but also with the string matching use case from a recent client in mind, I attended a talk about FuzzyWuzzy. A library for Levenshtein distance matching in different flavors. That is, also for partial matches and disregarding the order of tokens/words. Overall a very basic string matching library.
Text analysis in social media raises some interesting questions, such as, how to deal with irony or emojis and how to keep words such as ‘won’t’ and ‘prime minister Winston Churchill’ intact/together during tokenization. Unfortunately, no answers yet. A useful tip:
import nltk and use
nltk.download() to get some sample text data sets.
A nice description of how to use SpaCy, an NLP library, as a tool for sentiment analysis by Thomas Aglassinger. SpaCy automatically splits sentences and tokens. Gives you the lemma or part-of-speech for a token, has support for emoji’s, and is even extendable. Main take-away: language is difficult, so start very simple.
Two Bits of Open Source
What can you learn from a fourteen-year-old? Well, a lot actually, looking at EduBlocks created by Joshua Lowe. Josh created a drag and drop version of Python 3 to introduce Python to children at an earlier age. Of course, also adults can benefit from this platform.
EduBlocks fills the gap from drag and drop programming like Scratch to text based programming languages. It enables people who lack the necessary typing skills to slowly get used to the syntax and constructs used in Python. It is very light-weight as it runs on a raspberry pi and since it is open-source everyone is invited to contribute.
Anna Dorogush from Yandex presented their open-sourced gradient boosting algorithm CatBoost. It should be much more stable for hyperparameter tuning as it grows symmetric trees. Further it handles categorical data without preprocessing, in fact, it is faster because it only deals with these features in the algorithm itself. On CPU the runtime is only slightly worse than other gradient boosting implementations, but much faster on GPU. There are many tutorials available, e.g. on SHAP values for feature importance.
The basis for the next two talks was laid during our GDD Fridays. At GoDataDriven we get one Friday a month (GDD Friday) during which we can work on anything we like.
As you can see there were some technical issues when GDD'ers Bas Harenslak and Vincent Warmerdam entered the stage. A defective projector resulted in a nice interaction between the presenters and the technician who was pressing spacebar on Vincent's laptop behind the scene each time when the presenter waved his arms.
In their talk they explain how they over-engineered their algorithm to beat Vincent's girlfriend at a card game named SushiGO. They used AWS Chalice and AWS Lambda to simulate many games in search of an optimal strategy.
Next GDD'er on stage was Marcel Raas, talking about his passion for music and deep learning. He started by generating music from random numbers and sinusoids and continued to show how to use deep learning to detect and filter specific instruments in music.
When Marcel played his digitally composed tunes in front of the audience it definitely felt like people were having fun with Keras!
A nice quote to end with, stated by Ian Ozsvald (co-organizer of PyData London) during his aptly named talk 'Citizen Science with Python'. His self-proclaimed mission was to convert us all to data scientists.
He showed some examples to use our skills for the good. For example, a drone-based tracking system to locate orangutans in the jungle by Dirk Gorissen. And how he gathered data from his wife in trying to understand why she sneezes so much, leading to a different prescription from the doctor.
Follow us for more of this
GoDataDriven open source contribution: December 2018 edition
January 14, 2019
Using the Airflow Experimental Rest API to trigger a DAG
January 12, 2019
Apache Airflow graduation as Apache Top-Level
January 08, 2019
Data Survey 2018/2019 - Data 50
January 07, 2019
Use a SSH-key to access your cloud resources with socks-proxy
December 31, 2018
Looking Back at our Deep Learning Frenzy
December 28, 2018