Can external people be productive data scientists?
Now that some larger services companies are starting to hire "data scientists", the hype is clearly heating up. So time to consolidate! In a series of short blog posts I will touch upon some topics related to data science and describe how I see and feel about those.
Skills and domain knowledge
In a recent article, Omid Shiraji, CIO at Working Links is quoted to say that "[y]ou need skilled people internally that understand your data and how it creates something useful for the organisation. Externally contracted people can't do that." Now, since we run a business where you can hire external, highly skilled people, I started to ponder upon this statement.
I presume the basis of Shiraji's argument is the well-known Data Science Venn Diagram. This diagram labels data science as the overlap of "Hacking Skills", "Math & Statistics Knowledge" and "Substantive Expertise". Based on my experience in data science, I do not doubt that the combination of "Math" and "Hacking" is essential, but I've always struggled with "Substantive Expertise". Reason is that "Math" and "Hacking" are translational skill sets (they can be applied to any domain), while "Substantive Expertise" is tied to some domain. For example, I could apply my knowledge on time series analysis in a Financial domain or in a Health domain, while much of the domain knowledge from the Financial domain is not directly applicable in the Health domain (unless you would to remark cynically that Health is all about Finance, nowadays).
People with domain knowledge are very valuable to any organisation, and we need actual business problems to guide our analyses and development. Otherwise, we're just doing research. Moreover, it helps when a data scientist knows the organisation, what it stands for and what current problems and challenges are. Also, it definitely helps to know where to get internal data sets and how to interpret these data. But still, these questions can be asked to people that are not data scientists. The same goes for the interpretation of analyses done by data scientists: when clearly presented in the context of the business problems at hand, people that have substantive expertise can interpret the results and contribute.
An example from science
The question on whether external people can be productive reminds me of research I did when I was at the Leiden University. Back then, I looked into converging fields of science. For me, converging fields are those that start to merge their research. Often such merging is temporary, but ever so often it can result in a new field. A well-known example is bioinformatics, where advanced statistical and computational techniques are combined with (biological) genetics. Often, this merging is the result of external researchers who start to apply tools and techniques from their own field to research topics of some other field. In the papers of such converging fields, researchers combine references to boths fields: one for the methods and techniques and one for the the research topics and domain knowledge. Subsequent papers are submitted to journals in the "external" field and, when deemed relevant and up to standard, these papers will become part of the knowledge base of the converging field.
The take home message is that those "external" scientists were able to produce relevant research in a field that was (initially) not their own. And this again illustrates that skills related to tools and techniques can be translated to other knowledge domains. For me, this shows that external data scientists with an inquisitive, open, attitude can be productive in any domain, provided that they have access to people that have substansive expertise and who are willing to transfer that expertise to the data scientists.
Now, I strongly applaud companies that start to invest in strong teams of internal data scientists. As I stated in a previous blog post, this is much better than only to invest in technology. And it is also way better than to invest in only external people, because then you will not internalise data science capabilities. In the end, I think that perhaps it's better to change "Substantive Expertise" in the Data Science Venn Diagram to "Inquisitive Attitude and Business Sense". This skill is the ability to learn fast, combined with an open mind set to and a sincere interest in the domain where you are applying your skills. Such people people can be external to your organisation and still create business value.
But to suggest that external people cannot be productive, is simply not true.
Follow us for more of this
Testing and debugging Apache Airflow
February 22, 2019
The Zen of Python and Apache Airflow
February 18, 2019
AWS Machine Learning Competency Status for GoDataDriven
February 14, 2019
GoDataDriven Open Source Contribution for January 2019, the Apache Edition
February 13, 2019
Our social responsibility as a company
February 08, 2019
Keras: multi-label classification with ImageDataGenerator
January 31, 2019