More and more articles are appearing on Artificial Intelligence (AI, machine learning, (or deep learning), and many writers talk about AI, machine learning and data science without differentiation, as if these terms were broadly interchangeable. What’s going on exactly?
Let us start by describing Artificial Intelligence as the implementation of intelligent agents. According to Peter Norvig and Stuart Russel, an intelligent agent is an autonomous entity capable of perceiving its environment via sensors, of interacting with it using actuators (in other words, interacting with its environment), capable of learning, analysing, using knowledge, and taking decisions.
Historically, the first AIs were not actually “learning”. At best they used heuristic functions combined with rules engines. Today, the evolution of technology means that we can no longer conceive of an AI which is not “learning”. In particular this is attributable to recent progress in deep learning algorithms.
And indeed, the fact of “teaching” a machine is literally what “machine learning” is all about. This is based on (mainly statistical) algorithms to enable a machine to “learn” on the basis of a number of correct responses which are known beforehand (training data or learning base). Without the availability of this – often very large – database, learning is not possible.
Machine Learning, a key discipline in Artificial Intelligence
It is immediately clear from these definitions that machine learning is a major discipline in contemporary artificial intelligence. The algorithms which make learning possible have mainly been developed thanks to another, rather older, discipline, statistics.
The simpler the algorithm, the closer it is to the underlying statistics; the more complex it is, the more calls on combinations of elementary statistical approaches which hence form the building bricks of contemporary machine learning (as it is explained very well by Russian data scientist and mathematician Vladimir Vapnik). In passing we would note that the more complex the algorithm, the more precise it will be, but also the larger a training base it would need to be able to operate.
As a large part of the success of statistics and machine learning lies on the correct preparation and transformation of data, we are very rapidly seeing the appearance of a discipline which covers at lease data preparation, statistics and machine learning, which we can safely call “data science”.
The global discipline which allows the development of all sorts of algorithms for AI is thus known as data science, and its practitioners as data scientists or data engineers.
It seem perfectly clear that data science and artificial intelligence have a great deal in common.
To what extent can we think of the two disciplines as the same?
The first objection that can be raised is that it is possible to work in data science without doing artificial intelligence. For example, it would be enough to conduct market research using statistical sampling from a population. Such a study can perfectly reasonably be thought of as data science, without having anything to do with artificial intelligence.
Indeed, there is an entire explanatory and predictive facet of data science which aims to provide “one shot” responses to business questions but without any desire at all to automate the answer.
This leads us to a first conclusion which is that AI does not include anywhere near all the activities of data science.
So does AI fall wholly within data science?
The aspects of collecting and returning information are quite clearly an integral part of data science. Indeed one of the major of data science actually lies in the ability to return information properly and to provide a good explanation of the knowledge acquired from algorithms to services using it.
If we consider that the act of perceiving one’s environment using sensors forms part of information collection process, and the part that enables an agent to act on it directly using actuators forms part of the action of returning this information or this knowledge, all that is left to us us to examine the “intelligence” element of AI to find out whether we can include this activity in data science.
This “intelligent” element is defined as we have seen, as the capacity for an intelligent agent to learn, analyse and use knowledge and to take decisions. We have called this activity “machine learning” accepted that it was an integral (and even major) element of data science.
From this we can conclude unequivocally that AI must logically form part of the broader discipline that constitutes data science, the reverse being false since data science also includes data preparation, statistics and all forms of study performed using all or part of these methods.
Artificial intelligence is the most complex discipline of data science
This leads us to legitimately define data science as the conjunction of four hybrid disciplines:
- data preparation
- machine learning
- artificial intelligence
We can thus see that these terms are not in any way interchangeable. Practitioners working in one or more of these four disciplines are all data scientists or data engineers.
These four disciplines are, to reiterate, imbricated and interdependent since, these days, without machine learning it is impossible to have artificial intelligence, without statistics we cannot have machine learning and without data transformation, statistical modelling cannot work.
Among the disciplines of data science, AI is the most complex to implement, since it of necessity calls on the other three, from dataprep to machine learning.
However, it is not possible, without seriously misusing the language, to replace the term data science with AI, which is in only one of its uses, or perhaps the culmination of a knowledge-based perspective.