Home > Data Science / AI > Understanding AI and Data Science > Does Auto-Machine Learning (AutoML) really exists?

Understanding AI and Data Science

Does Auto-Machine Learning (AutoML) really exists?

26 November 2020 Updated at 15 May 2023

Automated machine learning (AutoML) has existed since 1990, it was considered as a silent revolution in the Artificial Intelligence (AI) field. When we analyze the term AutoML, we see that it refers to two words, Automated and Machine Learning.

Does Auto-Machine Learning (AutoML) really exists?

Machine Learning with its different types of learning

Supervised (Labeled data)
Unsupervised (Unlabeled data)
Semi-supervised (A mixture between labeled and unlabeled data)
Reinforcement learning (learning from mistake)

AutoML aims to optimize and accelerate human tasks by improving everyday life. The list of examples could be very long, but I will mention a few: automatic waste classification, optimization of water filtering membranes maintenance, cyber security protocols improvement to detect attacks.

The “Auto” part refers to the automation algorithms of ML by using Machine Learning algorithms. In other words, we are taking the AI to another level, and that’s what leads the AutoML to become a hot topic in both Industry and Academia. However, the main question remains whether it is a real process or not.

The AutoML consists in optimizing all of the pipeline for a data science project. By this, we are referring to the Cross Industry Standard Process for Data Mining (CRISP-DM) methodology with the main phases: Business understanding, data understanding, data preparation, modeling, evaluation and deployment. This methodology defines the step by step guide of this project. Outside of the “business understanding” phase ,the AutoML aims to automate the whole pipeline in order to facilitate the task to a non-expert in this field (For instance Cloud AutoML by Google for the vision).

Some of the advantages of AutoML:

1. A good background for data preparation

Cleaning (filter noisy) and formatting (coded value like categorical) data needs a good background for data preparation. With the AutoML we can accelerate this phase by a process in which we have a different way to format and detect the noise in data.

2. Avoiding using the default parameters in the models

Because searching for the best parameters needs a knowledge of the Grid Search & Random Search methods (tuning techniques that attempts to compute the optimum values of hyperparameters) in order to give a list of settings and then choose the best ones. This whole process can be time consuming and that is why AutoML is needed to solve the problem.

3. Simplification to create and manage models

Usually, the data scientist make a list of the interesting models according to the context and to the problem. This requires a deep knowledge and a business expertise in the field of data. AutoML makes this step easier because it is a pipeline with more models to use for most problems.

4. Deep Learning (DL) Optimization

The Deep Learning is a function that imitates the human brain in processing data and creating patterns to be used in the decision making process. To do so, we have to look for the best architecture of neural network for the specific problem. For example, with Keras, an open source library for Deep Learning, we need a lot of lines of code to make the best architecture. However, thanks to the method Auto-Keras (library for DL) of Machine Learning, we are now able to obtain a better result with way less lines.

Automated Machine Learning Libraries:

To discover in depth the advantages mentioned above, here are some libraries of Auto-Machine Learning:

The following phases	Use and source
Data Cleaning Hyperparameter selection Model selection	The Machine Learning Box (MLBox)
Model selection Hyperparameter tuning Feature engineering	Auto-sklearn H2O Auto-ML
Feature Selection Feature Preprocessing Feature Construction Model Selection Parameter Optimization	TPOT stands for Tree-based Pipeline Optimization Tool
Automated DL architecture	Auto-Keras Ludwig

List of open source and payable tools for AutoML

We gathered here another list of open sources and payable tools for Automated Machine Learning.

Open source

TPOT (Tree-based Pipeline Optimization Tool)
MLBox
Auto-Sklearn
Auto-Keras
Auto-Pytorch

Payable

Google Auto-ML
DataRobot
PurePredictive
H2O.ai
Amazon Lex

Finally, to answer the initial question: “Is AutoML a real thing or not?”. We can’t say that it doesn’t exist, but in my understanding, we can explain “automated” as a loop that contains plenty of methods and processes(cleaning, formatting, tuning parameter, model machine learning, architecture deep learning…) and turns in order to provide the best case scenario for a specific problem.

Today, Automated Machine Learning is in development. On one side it can give efficient results, on another it still needs some improvements. Indeed, it is very limited to supervised learning and face many difficulties relating to unsupervised case and reinforcement learning.

AutoML won’t replace the data scientists

AutoML won’t replace the data scientists, so we do not need to worry colleagues (at least for now). However, we can see it is as a support to Data Scientists and a great way to facilitate this complex field for the non-experts, so they can benefit from the Machine Learning experience.

In addition, for a better illustration, we may consider Kaggle (a competition community of Machine Learning). In fact, humans have always won with models not generated by AutoML tools. As far as I know at least, AutoML didn’t win any contest of data science.

So, is there going to be a day when the pipeline generated by AutoML wins such competitions?

At the end, I hope that it was easy to understand for all of you. I am at your disposal to further discuss and answer your questions in the comments section 😉.

Business & Decision

Data scientist with a very good mastery of the field due to his training in Data science. Miloud intervenes in all phases of Data science projects: Data management, Machine Learning, Data visualization and reporting.

Learn more >

Comments (2)

Your email address is only used by Business & Decision, the controller, to process your request and to send any Business & Decision communication related to your request only. Learn more about managing your data and your rights.

imran ahmed Le 07 October 2021 à 14h43

An excellent article i have read regarding AutoML but i still have a doubt even if i have to start using AutoML, which library should i select first as there are plenty of libraries in AutoML

Miloud Belarebia Le 24 November 2021 à 18h36

Thanks Ahmed for your comment, I recommend you to start with auto-sklearn (https://automl.github.io/auto-sklearn/master/)
to try Machine Learning and AutoKeras (https://autokeras.com/) for Deep Learning. Finally, to understand both sides.

Data Strategy

Data Governance and Data Management: what's the difference?

In a world where companies' ambition is to be data-driven, data governance and data management are still too often regarded as being synonymous. Let us clear up the confusion. Data...

Premium

Data Governance

REPLAY | Let’s win the Data Mesh Battle: the winning alliance between Data Architecture and Data Governance

The Data Mesh vision has brought to light the various challenges that companies face in managing and effectively utilizing their data. This is not a new challenge, as it has...

Premium

Data Trends

REPLAY | The missing pillars in the Data Mesh approach

Is Data Mesh a utopia? For two years now, the concept of Data Mesh has been seen as a revolution in the world of data since it would fill the...

Premium

Data Strategy

WHITEPAPER | Spiderman guides you towards a data-driven company

There is tremendous enthusiasm for Data Mesh. And for good reason: we finally have a complete framework for valuing data at company level. This white paper offers you a deep...

Data Trends

Data Mesh, a total data-driven model

Through its four main pillars, Data Mesh truly moves away from the dogma of centralisation and all-technology in favor of a global approach based on federation. Data Mesh thus promises...

Data Trends

#Data #AI: 7 hot topics for 2023

The 7 hot topics Data and AI of this 7th edition are the solutions for the performing company. What are specifically the trends and topics to track in 2023? This...

Data Trends

Data Mesh: Practical examples and feedback

Mastering data and its uses to create value is an ambition that is increasingly shared. However, organisations continue to face obstacles that Data Mesh could help to overcome… provided the...

Data Trends

Data Mesh: federated governance to guarantee efficiency

Data governance is an essential part of any data strategy. Nevertheless, it remains complex to deploy in a traditional organisation, but through its federated approach, Data Mesh is able to...

Data Trends

Data infrastructure self-service as the technological driving force behind Data Mesh

Data Mesh is not strictly speaking a technological approach, but data domains need powerful technical resources to develop their products. The data platform and its infrastructure are a facilitator for...

Data Trends

Data Mesh: data is a product

Oil, digital black gold, strategic asset… With Data Mesh, data is regarded as a product. Data domains are responsible for managing the life cycle of these products and for sharing...

Data Trends

Data domains: Data Mesh gives business domains superpowers

The Data Mesh concept is based on four main pillars, the first of which is an organisation divided into data domains. To be effective, this structure must reflect the business...

Data Strategy

Data Scientist/Data Engineer: the skills required to give you a head start in Data Science

Back in 2012, the Harvard Business Review published an article with a somewhat revealing title: "Data Scientist: The Sexiest Job of the 21st Century". Years later, we revisit this vision...

Integrating AI and Data Science

How is the Port of Antwerp optimising logistics with data science?

Looking for fast, intelligent exploitation of its mass of data, the Port of Antwerp turned to Business & Decision to optimise and secure the safety and efficiency of its maritime...

Integrating AI and Data Science

Artificial intelligence, machine learning, data science: are these terms interchangeable?

Many writers talk about AI, machine learning and data science, as if these terms were broadly interchangeable. What’s going on exactly?

Does Auto-Machine Learning (AutoML) really exists?

Machine Learning with its different types of learning

Some of the advantages of AutoML:

1. A good background for data preparation

2. Avoiding using the default parameters in the models

3. Simplification to create and manage models

4. Deep Learning (DL) Optimization

Automated Machine Learning Libraries:

List of open source and payable tools for AutoML

Open source

Payable

AutoML won’t replace the data scientists

Discover also

Data Governance and Data Management: what's the difference?

REPLAY | Let’s win the Data Mesh Battle: the winning alliance between Data Architecture and Data Governance

REPLAY | The missing pillars in the Data Mesh approach

WHITEPAPER | Spiderman guides you towards a data-driven company

Data Mesh, a total data-driven model

#Data #AI: 7 hot topics for 2023

Data Mesh: Practical examples and feedback

Data Mesh: federated governance to guarantee efficiency

Data infrastructure self-service as the technological driving force behind Data Mesh

Data Mesh: data is a product

Data domains: Data Mesh gives business domains superpowers

Data Scientist/Data Engineer: the skills required to give you a head start in Data Science

How is the Port of Antwerp optimising logistics with data science?

Artificial intelligence, machine learning, data science: are these terms interchangeable?

Informations sur la gestion de vos données et vos droits