Home > Data Science / AI > Understanding AI and Data Science > Data Engineer: which training programs to choose?

Understanding AI and Data Science

Data Engineer: which training programs to choose?

9 April 2020 Updated at 15 May 2023

To all those young people wishing to embark on a career in Data Science, my advice was to begin with a Data Engineering job rather than directly as a Data Scientist… Today, I would like to walk you through the apprenticeships and training programmes that will help you become a Data Engineer.

Data Engineer: which training programs to choose?

Data Engineer: which training programmes to choose?

The Data Engineer is one who has a perfect understanding of Big Data ecosystems such as Spark or Hadoop as well as, of course, their programming. Data Engineers specifically perform the following tasks:

Operationalize Big Data infrastructure
Handle data ingestion and display in and from the infrastructure
Take care of the preparation, and 1st level re-coding, of data
Program, automate and optimize algorithms in the target infrastructure

“A Data Engineer is first and foremost an IT engineer”

A Data Engineer is first and foremost an IT engineer. As such, formal training in IT from universities and engineering schools in Big Data and of course, Data Engineering is perfectly suitable for the position.

It should preferably include the most advanced possible course in Python and Scala languages, and of course also include high-level proficiency in SQL and its NoSQL modern “versions” like Hive, Impala or Spark SQL

Technical education programmes (we are not talking about “soft skills” here, which will be addressed in a later article) for their part, have to focus on major areas covering at the very least Big Data, the Cloud, DevOps methods and of course, Artificial Intelligence.

Regarding Big Data, the imperatives are naturally Spark and Hadoop. Hadoop encompasses the whole ecosystem known as “Zookeeper,” and includes technologies such as Hive, Nifi, Oozie and Kafka. A significant number of these technologies are Java-based, it is thus a good idea to have a sound knowledge of Java to better control the environment, but you do not have to be a Java JEE developer to become a Data Engineer (least of all a Data Scientist).

Spark is now a must

On the other hand, Spark is absolutely essential. You can approach it in two ways, either through Python by means of PySpark, or through the Scala language. Both options are available, but needless to say that having both feathers in your cap would be ideal.

Regarding the Cloud, since there is no lack of private publishers, choices must be made. What matters is mastering the Spark and Hadoop infrastructures in the target cloud(s) that you have chosen to study. Indeed, each cloud has its own specific technical characteristics, in particular Artificial Intelligence APIs provided by the publisher, that you would be wise to get to know.

Carrying out an AI project

This brings us to Artificial Intelligence (AI), a field in which Python, of course, reigns supreme. Beware however as, in addition to Python, you will be required to have an expert knowledge of quite a few Python libraries in order to be able to see a whole project through. Here we can only mention the most important libraries such as Numpy, Pandas, Mathplotlib, Scikit-learn, Mllib, etc. You should also have a good understanding of version control systems and notebooks such as Git, GitHub, GitLab, Jupyter, Zeppelin, etc.

It is always possible to execute an AI project 100% in Python, but no customer (internal or external) will want to buy such AI because it will be too expensive and too difficult to maintain.

Therefore, you must also be able to manage Artificial Intelligence platforms available on the market. There are a number of those and the objective of this article is not to draw up an exhaustive list of these platforms or to compare them. In this regard, I would like to draw your attention to, for example, Gartner’s – Magic Quadrant 2019 benchmarks relating to Data Science and Machine Learning platforms.

Technical and project management training

Besides the technical education programmes outlined above, you should also consider training in project management methods which include namely DevOps methods and the inescapable CRISP method. The Scrum method must also be understood but beware of “poor combinations” of methods in Data Science, as for instance CRISP must always be given priority over Scrum. To support DevOps, docker and kubernetes technologies are particularly interesting.

This is why Business & Decision has launched the Data School (École de la Data) in France in order to train Data Engineers, and ultimately Data Scientists and Data Analysts. The idea behind this project being to ensure that the talented young people who join our ranks are fully operational after having attended Business & Decision’s Data School. The course provided lasts three months and complements what is being taught in engineering schools and universities.

Business & Decision

Data Scientist – Director of the Data Science & Customer Intelligence offerings at Business & Decision France. Also teaching Data Mining & Statistics applied to Marketing at EPF Schoolg and ESCP-Europe.

Learn more >

Your email address is only used by Business & Decision, the controller, to process your request and to send any Business & Decision communication related to your request only. Learn more about managing your data and your rights.

Data Strategy

Data Governance and Data Management: what's the difference?

In a world where companies' ambition is to be data-driven, data governance and data management are still too often regarded as being synonymous. Let us clear up the confusion. Data...

Premium

Data Governance

REPLAY | Let’s win the Data Mesh Battle: the winning alliance between Data Architecture and Data Governance

The Data Mesh vision has brought to light the various challenges that companies face in managing and effectively utilizing their data. This is not a new challenge, as it has...

Premium

Data Trends

REPLAY | The missing pillars in the Data Mesh approach

Is Data Mesh a utopia? For two years now, the concept of Data Mesh has been seen as a revolution in the world of data since it would fill the...

Premium

Data Strategy

WHITEPAPER | Spiderman guides you towards a data-driven company

There is tremendous enthusiasm for Data Mesh. And for good reason: we finally have a complete framework for valuing data at company level. This white paper offers you a deep...

Data Trends

Data Mesh, a total data-driven model

Through its four main pillars, Data Mesh truly moves away from the dogma of centralisation and all-technology in favor of a global approach based on federation. Data Mesh thus promises...

Data Trends

#Data #AI: 7 hot topics for 2023

The 7 hot topics Data and AI of this 7th edition are the solutions for the performing company. What are specifically the trends and topics to track in 2023? This...

Data Trends

Data Mesh: Practical examples and feedback

Mastering data and its uses to create value is an ambition that is increasingly shared. However, organisations continue to face obstacles that Data Mesh could help to overcome… provided the...

Data Trends

Data Mesh: federated governance to guarantee efficiency

Data governance is an essential part of any data strategy. Nevertheless, it remains complex to deploy in a traditional organisation, but through its federated approach, Data Mesh is able to...

Data Trends

Data infrastructure self-service as the technological driving force behind Data Mesh

Data Mesh is not strictly speaking a technological approach, but data domains need powerful technical resources to develop their products. The data platform and its infrastructure are a facilitator for...

Data Trends

Data Mesh: data is a product

Oil, digital black gold, strategic asset… With Data Mesh, data is regarded as a product. Data domains are responsible for managing the life cycle of these products and for sharing...

Data Trends

Data domains: Data Mesh gives business domains superpowers

The Data Mesh concept is based on four main pillars, the first of which is an organisation divided into data domains. To be effective, this structure must reflect the business...

Data Trends

#DATA: 7 hot topics for 2020

The year 2020 looks promising, more than ever driven by Data. What are specifically the trends and topics to track? Here our videos to find out the answers with images...

Integrating AI and Data Science

How is the Port of Antwerp optimising logistics with data science?

Looking for fast, intelligent exploitation of its mass of data, the Port of Antwerp turned to Business & Decision to optimise and secure the safety and efficiency of its maritime...

Integrating AI and Data Science

Artificial intelligence, machine learning, data science: are these terms interchangeable?

Many writers talk about AI, machine learning and data science, as if these terms were broadly interchangeable. What’s going on exactly?

Data Engineer: which training programs to choose?

Data Engineer: which training programmes to choose?

“A Data Engineer is first and foremost an IT engineer”

Spark is now a must

Carrying out an AI project

Technical and project management training

Discover also

Data Governance and Data Management: what's the difference?

REPLAY | Let’s win the Data Mesh Battle: the winning alliance between Data Architecture and Data Governance

REPLAY | The missing pillars in the Data Mesh approach

WHITEPAPER | Spiderman guides you towards a data-driven company

Data Mesh, a total data-driven model

#Data #AI: 7 hot topics for 2023

Data Mesh: Practical examples and feedback

Data Mesh: federated governance to guarantee efficiency

Data infrastructure self-service as the technological driving force behind Data Mesh

Data Mesh: data is a product

Data domains: Data Mesh gives business domains superpowers

#DATA: 7 hot topics for 2020

How is the Port of Antwerp optimising logistics with data science?

Artificial intelligence, machine learning, data science: are these terms interchangeable?

Informations sur la gestion de vos données et vos droits