Back in 2012, the Harvard Business Review published an article with a somewhat revealing title: “Data Scientist: The Sexiest Job of the 21st Century”. Years later, we revisit this vision in the light of technological developments, namely in the field of Artificial Intelligence. The profession is currently enjoying a surge in popularity and that is just as well, but it is one in which success means combining a strong training background with technical skills and extensive experience drawn from concrete cases.
Data Scientist: a professional with a comprehensive skill set
The designation Data Scientist describes a professional who not only possesses expertise in multivariate statistics, Machine Learning, predictive analysis and programming, but also in-depth knowledge of the business processes with which (s)he works, as explained in a KDnuggets article written by Andrew Silver.
Data Science programming is not programming in the traditional developer sense of the term. It is based on the use of Python language Data Science libraries such as Pandas, SciKit-Learn or MLlib, which require a sound theoretical knowledge of statistics and Machine Learning. Its programming is all the more specific in that it may involve Big Data architectures such Hadoop or Spark, which make another level of technical know-how necessary even before any programming task can be attempted. This extreme specialisation has led, during the past several years, to the emergence of a new profession: Data Engineer.
Data Engineer: the Big Data and Data Science specialist
Data Engineers have a perfect command of Big Data ecosystems like Spark or Hadoop, amongst others, and, of course, any associated programming method and technique. Data Engineers will namely perform the following tasks:
- Operationalise Big Data infrastructure
- Handle data ingestion and display in and from the infrastructure
- Take care of the preparation and 1st level re-coding of data
- Program, automate and optimise algorithms in the target infrastructure
The Data Engineer profession is however evolving into a new profession: Data Architect, or architect in Data Science.
Data Scientist: a complementary path
The Data Scientist’s path complements that of the Data Engineer since his(her) role consists not only in mastering, at a higher level, the fields of statistics and algorithms, but also the business aspect and corporate function needs. The Data Scientist therefore relies on the Data Engineer to deal with the more technical elements.
“Thinking that a Data Scientist can single-handedly handle everything is unrealistic”
This division of labour is one that is logical, recognised in the profession and works. Thinking that a Data Scientist can single-handedly handle everything is very unrealistic. In order to acquire the necessary business expertise and keep his(her) algorithmic and Machine Learning skills up-to-date, the Data Scientist, will inevitably find himself(herself) having to forego some technical training, namely with respect to programming and infrastructure.
People with extremely different profiles can become Data Engineers and Data Scientists. Young graduates from engineering schools or universities, with a strong technical background, will have to acquire experience and gain maturity in order to fully understand and integrate customers’ challenges and deploy a vital and global set of algorithms, before they can be considered Data Scientists, as defined initially by Doctor Conway in 2016.
Data Science expertise is a product of a solid theoretical foundation in statistics and Machine Learning, and intensive problem-solving and data analysis practice in various environments and different customer contexts. The mere selection of algorithms during the modelling phase requires extensive experience. Most Data Science and Artificial Intelligence algorithms are currently part of the public domain and are thus, in theory, accessible to all. However, they cannot all be taught as part of an initial learning curriculum.
From Data Engineer to Data Scientist
It is based on these observations that we guide young Data Science graduates towards Data Engineer jobs first, so they can later move into the Data Scientist profession. It should be noted that the number of vacant Data Engineer positions in companies (for all sectors) is much higher than the number of vacant Data Scientist positions. This is hardly surprising since 80% of the Data Science workload involves data re-coding and preparation. Data that has not been properly prepared will inevitably generate results that are false at worst, and disappointing at best.
Moreover, there may be quite a significant way to convert a prototype (also called POC – Proof Of Concept) that will serve as a feasibility test into an automated, life-size Data Science application that can be put into production: and this is something Data Engineers excel at.
Data Scientist, Data Engineer, Data Architect: three jobs that are each sexier than the next, because as you surely have realised by now, Data Science is not only a matter of expertise, it is also all about team work!