Home > Data Science / AI > Understanding AI and Data Science > Does Auto-Machine Learning (AutoML) really exists?

Understanding AI and Data Science

Does Auto-Machine Learning (AutoML) really exists?

26 November 2020 Updated at 15 May 2023

Automated machine learning (AutoML) has existed since 1990, it was considered as a silent revolution in the Artificial Intelligence (AI) field. When we analyze the term AutoML, we see that it refers to two words, Automated and Machine Learning.

Does Auto-Machine Learning (AutoML) really exists?

Machine Learning with its different types of learning

Supervised (Labeled data)
Unsupervised (Unlabeled data)
Semi-supervised (A mixture between labeled and unlabeled data)
Reinforcement learning (learning from mistake)

AutoML aims to optimize and accelerate human tasks by improving everyday life. The list of examples could be very long, but I will mention a few: automatic waste classification, optimization of water filtering membranes maintenance, cyber security protocols improvement to detect attacks.

The “Auto” part refers to the automation algorithms of ML by using Machine Learning algorithms. In other words, we are taking the AI to another level, and that’s what leads the AutoML to become a hot topic in both Industry and Academia. However, the main question remains whether it is a real process or not.

The AutoML consists in optimizing all of the pipeline for a data science project. By this, we are referring to the Cross Industry Standard Process for Data Mining (CRISP-DM) methodology with the main phases: Business understanding, data understanding, data preparation, modeling, evaluation and deployment. This methodology defines the step by step guide of this project. Outside of the “business understanding” phase ,the AutoML aims to automate the whole pipeline in order to facilitate the task to a non-expert in this field (For instance Cloud AutoML by Google for the vision).

Some of the advantages of AutoML:

1. A good background for data preparation

Cleaning (filter noisy) and formatting (coded value like categorical) data needs a good background for data preparation. With the AutoML we can accelerate this phase by a process in which we have a different way to format and detect the noise in data.

2. Avoiding using the default parameters in the models

Because searching for the best parameters needs a knowledge of the Grid Search & Random Search methods (tuning techniques that attempts to compute the optimum values of hyperparameters) in order to give a list of settings and then choose the best ones. This whole process can be time consuming and that is why AutoML is needed to solve the problem.

3. Simplification to create and manage models

Usually, the data scientist make a list of the interesting models according to the context and to the problem. This requires a deep knowledge and a business expertise in the field of data. AutoML makes this step easier because it is a pipeline with more models to use for most problems.

4. Deep Learning (DL) Optimization

The Deep Learning is a function that imitates the human brain in processing data and creating patterns to be used in the decision making process. To do so, we have to look for the best architecture of neural network for the specific problem. For example, with Keras, an open source library for Deep Learning, we need a lot of lines of code to make the best architecture. However, thanks to the method Auto-Keras (library for DL) of Machine Learning, we are now able to obtain a better result with way less lines.

Automated Machine Learning Libraries:

To discover in depth the advantages mentioned above, here are some libraries of Auto-Machine Learning:

The following phases	Use and source
Data Cleaning Hyperparameter selection Model selection	The Machine Learning Box (MLBox)
Model selection Hyperparameter tuning Feature engineering	Auto-sklearn H2O Auto-ML
Feature Selection Feature Preprocessing Feature Construction Model Selection Parameter Optimization	TPOT stands for Tree-based Pipeline Optimization Tool
Automated DL architecture	Auto-Keras Ludwig

List of open source and payable tools for AutoML

We gathered here another list of open sources and payable tools for Automated Machine Learning.

Open source

TPOT (Tree-based Pipeline Optimization Tool)
MLBox
Auto-Sklearn
Auto-Keras
Auto-Pytorch

Payable

Google Auto-ML
DataRobot
PurePredictive
H2O.ai
Amazon Lex

Finally, to answer the initial question: “Is AutoML a real thing or not?”. We can’t say that it doesn’t exist, but in my understanding, we can explain “automated” as a loop that contains plenty of methods and processes(cleaning, formatting, tuning parameter, model machine learning, architecture deep learning…) and turns in order to provide the best case scenario for a specific problem.

Today, Automated Machine Learning is in development. On one side it can give efficient results, on another it still needs some improvements. Indeed, it is very limited to supervised learning and face many difficulties relating to unsupervised case and reinforcement learning.

AutoML won’t replace the data scientists

AutoML won’t replace the data scientists, so we do not need to worry colleagues (at least for now). However, we can see it is as a support to Data Scientists and a great way to facilitate this complex field for the non-experts, so they can benefit from the Machine Learning experience.

In addition, for a better illustration, we may consider Kaggle (a competition community of Machine Learning). In fact, humans have always won with models not generated by AutoML tools. As far as I know at least, AutoML didn’t win any contest of data science.

So, is there going to be a day when the pipeline generated by AutoML wins such competitions?

At the end, I hope that it was easy to understand for all of you. I am at your disposal to further discuss and answer your questions in the comments section 😉.

Business & Decision

Data scientist with a very good mastery of the field due to his training in Data science. Miloud intervenes in all phases of Data science projects: Data management, Machine Learning, Data visualization and reporting.

Learn more >

Comments (2)

Your email address is only used by Business & Decision, the controller, to process your request and to send any Business & Decision communication related to your request only. Learn more about managing your data and your rights.

imran ahmed Le 07 October 2021 à 14h43

An excellent article i have read regarding AutoML but i still have a doubt even if i have to start using AutoML, which library should i select first as there are plenty of libraries in AutoML

Miloud Belarebia Le 24 November 2021 à 18h36

Thanks Ahmed for your comment, I recommend you to start with auto-sklearn (https://automl.github.io/auto-sklearn/master/)
to try Machine Learning and AutoKeras (https://autokeras.com/) for Deep Learning. Finally, to understand both sides.

Digital transformation

What if every Customer Experience was perfectly orchestrated?

In this webinar you'll learn how organizations could seamlessly align human and digital interactions to craft truly personalized customer journeys. This session delves into leveraging data, AI, and customer insights...

Data Trends

#AI: 7 hot topics for 2025

The 7 IA hot topics of this 9th edition are the solutions for the performing company. What are specifically the trends and topics to track in 2025? Here our videos...

Data and AI news

Generative AI is not a sprint... but a Marathon

Discover in this webinar how to not only sprint off the starting blocks but also conquer the long-distance race to AI success! Ready, Set, Innovate! Discover how Generative AI is revolutionizing...

Data & AI culture

How to increase data maturity in International Organizations

Data maturity is becoming increasingly crucial for International Organizations seeking to maximize their impact. In this webinar, we will introduce the concept of data maturity and why it's vital for...

Data Visualization

How to harmonize Dashboard UX/UI within the entire organization to improve decision making?

Discover in this webinar some best practices to create effective dashboards that bring value to your business. What you will learn in this webinar: 5 Crucial Pillars for Effective Dashboards: Discover...

Data Governance

Exploring the Benefits of Data Catalogs for International Organizations

If you missed our recent webinar, "Exploring the Benefits of Data Catalogs for International Organizations" you're in luck! The replay is now available, offering you a second chance to know...

Data Strategy

Data Governance and Data Management: what's the difference?

In a world where companies' ambition is to be data-driven, data governance and data management are still too often regarded as being synonymous. Let us clear up the confusion. Data...

Customer Experience

Can you be customer-centric without full visibility of the customer journey?

The challenges facing businesses today in understanding and optimizing customer journeys are significant. With so many channels and touchpoints, it's easy for blind spots to emerge, leading to lost opportunities,...

Data Governance

REPLAY | Let’s win the Data Mesh Battle

The winning alliance between Data Architecture and Data Governance The Data Mesh vision has brought to light the various challenges that companies face in managing and effectively utilizing their data....

Data Trends

REPLAY | The missing pillars in the Data Mesh approach

Is Data Mesh a utopia? For two years now, the concept of Data Mesh has been seen as a revolution in the world of data since it would fill the...

Premium

Data Strategy

WHITEPAPER | Spiderman guides you towards a data-driven company

There is tremendous enthusiasm for Data Mesh. And for good reason: we finally have a complete framework for valuing data at company level. This white paper offers you a deep...

Data Strategy

Data Scientist/Data Engineer: the skills required to give you a head start in Data Science

Back in 2012, the Harvard Business Review published an article with a somewhat revealing title: "Data Scientist: The Sexiest Job of the 21st Century". Years later, we revisit this vision...

Integrating AI and Data Science

How is the Port of Antwerp optimising logistics with data science?

Looking for fast, intelligent exploitation of its mass of data, the Port of Antwerp turned to Business & Decision to optimise and secure the safety and efficiency of its maritime...

Integrating AI and Data Science

Artificial intelligence, machine learning, data science: are these terms interchangeable?

Many writers talk about AI, machine learning and data science, as if these terms were broadly interchangeable. What’s going on exactly?

Does Auto-Machine Learning (AutoML) really exists?

Machine Learning with its different types of learning

Some of the advantages of AutoML:

1. A good background for data preparation

2. Avoiding using the default parameters in the models

3. Simplification to create and manage models

4. Deep Learning (DL) Optimization

Automated Machine Learning Libraries:

List of open source and payable tools for AutoML

Open source

Payable

AutoML won’t replace the data scientists

Discover also

What if every Customer Experience was perfectly orchestrated?

#AI: 7 hot topics for 2025

Generative AI is not a sprint... but a Marathon

How to increase data maturity in International Organizations

How to harmonize Dashboard UX/UI within the entire organization to improve decision making?

Exploring the Benefits of Data Catalogs for International Organizations

Data Governance and Data Management: what's the difference?

Can you be customer-centric without full visibility of the customer journey?

REPLAY | Let’s win the Data Mesh Battle

REPLAY | The missing pillars in the Data Mesh approach

WHITEPAPER | Spiderman guides you towards a data-driven company

Data Scientist/Data Engineer: the skills required to give you a head start in Data Science

How is the Port of Antwerp optimising logistics with data science?

Artificial intelligence, machine learning, data science: are these terms interchangeable?

Informations sur la gestion de vos données et vos droits