Home > Data Science / AI > Integrating AI and Data Science > AI Industrialization: the key steps to a MLOps approach

Integrating AI and Data Science

AI Industrialization: the key steps to a MLOps approach

5 October 2022 Updated at 5 May 2023

The industrialization of artificial intelligence – one of the 7 hot data topics for 2022 requires the implementation of MLOps. This approach includes some necessary steps, including a common platform and a feature store. To learn more about this approach, we offer you a how-to-guide for an iterative, but unavoidable transformation.

After years, which were certainly fruitful in gaining experience, working on the development of PoC, organizations now aim to move into a new phase of maturity. And this phase aims in particular to design in an industrial way Data products with embedded artificial intelligence.

The objectives are quite simple, even more often than the resources needed to achieve them. This means that companies want to be able to develop models that can scale up, so they can better optimise value creation and use-case return on investment. However, industrialization does not only concern the production stage, but also the whole model life cycle, including monitoring and retraining.

Orchestrating the continuous interaction between data scientists and data engineers

This industrialization approach is known by the acronym MLOps. Simply put, it consists in replicating and adapting DevOps principles in order to apply them to the field of Machine Learning (ML). A more sophisticated definition presents MLOps as a set of practices combining Machine Learning, DevOps and Data Engineering to deploy and maintain AI systems in production in a reliable and efficient manner.

There you are the theoretical definition which underlines the purpose of MLOps without really providing any operating instructions.. Which is perfectly normal, since what we are really talking about here is a set of good practices. MLOps implementations can therefore differ from one organisation to another, but also over time. Moreover, as it is the case for DevOps, rather than a huge scale, the application is more a gradual adoption.

Modelled after DevOps, MLOps must, in practice, reconcile two worlds, not those of Devs and Ops, but those of Data Scientists and Data Engineers. Two communities that do not always share the same culture or the same programming language, even if Python is seriously starting to prevail in the AI field.

MLOps: an AI life cycle continuous chain

In contrast, DevOps and MLOps share the same ambition meaning a shift towards a continuous and agile process. This continuity is even more crucial in AI s a project does not end when the model, developed in iterative mode, is in production. Once designed, the model must be trained, maybe even retrained depending on its type and uses.

The aim of MLOps isto industrialize all the steps and processes involved in the development of an artificial intelligence system up to its maintenance in operational condition. Such a product is similar to a living organism, for which all the life cycle phases need to be industrialized. This is a specificity and a challenge compared to traditional application developments. The design factory of these AI is often called Data Fab or AI Fab.

The key point in MLOps is the availability of a common platform for the realization of all AI projects and actors in the organization.
DIDIER GAULTIER

The parallel with a factory can however be misleading. Industrialization does not mean complete automation. This is not today an unachievable goal in data science. AI producing AIs in an autonomous way is not relevant at the moment. MLOps mostly imposes best practices and here are the four main ones:

1. A common platform

A single platform will bring together data engineers and data scientists. The implementation of this good practice is paramount. Platform outlines may vary but four main categories can be identified. In the most basic form, it will be a common language, regardless of whether the chosen approach is code or low-code type – or both.

This common language makes possible model transmission between data scientists and data engineers whilst facilitating integration into the production environment.

To this basic development environment can be added a platform of “portable” AI type, such as Dataiku, KNIME, Alteryx, SAS or DataRobot. These solutions have the advantage of being executable in practically all technical environments (cloud and on premises) and connecting to all data sources.

4 MLOps best practices

1. A common platform
2. A feature store
3. Data Labelling & model training
4. Model Monitoring & Retraining

“Hyperscalers” type cloud platforms, generally proprietary, make up the second category. The three main ones are Azure ML (Microsoft), GCP Vertex AI (Google), AWS Sagemaker (Amazon). Finally, two other platforms are at the crossroads between infrastructure and AI: Snowflake and Databricks. These are only available in the cloud and portable from one hyperscaler to another. Initially these platforms essentially focused on storage and processing, but they are now evolving towards AI and Data Science.

This is not a comprehensive list of platforms. The key point in MLOps is having a common platform that can be used by all the stakeholders of the organisation to carry out all AI projects.

2. A feature store

The feature store is an integral part of the MLOps approach. It could actually be seen as its principal component. Though the term is quite new, it corresponds to an existing concept, previously referred to as Data Hub. With a slight difference since the feature store includes a new notion: feature engineering.

This philosophy in Data Science is based on the good practice of designing the simplest possible AI models, ensuring limited resource consumption whilst optimising transparency and explainability. These characteristics are essential in any “ethics by design” approach. This simplicity also offers savings in terms of computation time, costs and maintainability. The other side of the coin however, is substantial upstream work on data, its preparation and the creation of relevant indicators based on this data.

Within the context of MLOps, the aim of the feature store is thus to store all the ready-to-use data in models and to keep them up-to-date. Additionally, this feature store must be continuously documented and populated to feed algorithms. This is one of the tasks performed by the data engineer through the installation of data pipelines.

3. Data labelling & model training

Data platform and feature store help initiate the MLOps process, starting with the production of simple models. Their development is iterative. A model will not meet all the needs in its first iteration. Models are gradually enriched and all the upgrades are deployed in production. This triggers continuous exchanges of CI/CD type between data scientists and data engineers which makes industrialized and replicable processes a requirement to ensure efficiency. These exchanges and the organisation of these continuous interactions are at the heart of MLOps.

It should be noted that before being transmitted to the data engineer for production, the model is first tested and trained by the data scientist. It is also at this stage that data is labelled. But this can also be performed upstream when the feature store is created. Industrialization covers these various operations, including learning.

4. Model monitoring & retraining

Models need to be monitored. The Data Scientist will come up with the monitoring criteria. These include, for instance model accuracy, the number of false positives, the robustness, the data drift, the residual variances homogeneity, etc. The Data Engineer will be responsible for implementing the monitoring of all these indicators.

Supervision is semi-automatically carried out by the platform which triggers notifications in case of data drift, for example. Most of the time, a notification is followed by some manual action, or automatic model retraining or new data labelling operation. The action taken will depend on the type of model, monitoring and learning.

MLOps advantages

Globally, 90% of AI projects would never be industrialized. This figure shows the urgency and importance of adopting an MLOps approach and implementing its core principles. It’s important to note that MLOps is not only beneficial to projects in production, but also to experiments such as Proof of Value (the term PUC – Proof of Use Case – is also used in AI).

The number one advantage of MLOps is therefore to ensure the industrialization of AI projects and their delivery to end users. Through the feature store, MLOps also allows capitalization namely thanks to the reuse of datasets and components. These practices provide replicability, model stability, traceability and foster a collaborative culture between data scientists and data engineers, and other project players (security, IT, business sponsors, etc.).

MLOps helps to manage operation continuity as well as the features of models’ different versions. It also guarantees productivity gain in the AI design chain. Setting up the foundation of MLOps is undeniably an investment and its ROI is more than proven.

Business & Decision

Data Scientist – Director of the Data Science & Customer Intelligence offerings at Business & Decision France. Also teaching Data Mining & Statistics applied to Marketing at EPF Schoolg and ESCP-Europe.

Learn more >

Your email address is only used by Business & Decision, the controller, to process your request and to send any Business & Decision communication related to your request only. Learn more about managing your data and your rights.

Data Strategy

Data Governance and Data Management: what's the difference?

In a world where companies' ambition is to be data-driven, data governance and data management are still too often regarded as being synonymous. Let us clear up the confusion. Data...

Premium

Data Governance

REPLAY | Let’s win the Data Mesh Battle: the winning alliance between Data Architecture and Data Governance

The Data Mesh vision has brought to light the various challenges that companies face in managing and effectively utilizing their data. This is not a new challenge, as it has...

Premium

Data Trends

REPLAY | The missing pillars in the Data Mesh approach

Is Data Mesh a utopia? For two years now, the concept of Data Mesh has been seen as a revolution in the world of data since it would fill the...

Premium

Data Strategy

WHITEPAPER | Spiderman guides you towards a data-driven company

There is tremendous enthusiasm for Data Mesh. And for good reason: we finally have a complete framework for valuing data at company level. This white paper offers you a deep...

Data Trends

Data Mesh, a total data-driven model

Through its four main pillars, Data Mesh truly moves away from the dogma of centralisation and all-technology in favor of a global approach based on federation. Data Mesh thus promises...

Data Trends

#Data #AI: 7 hot topics for 2023

The 7 hot topics Data and AI of this 7th edition are the solutions for the performing company. What are specifically the trends and topics to track in 2023? This...

Data Trends

Data Mesh: Practical examples and feedback

Mastering data and its uses to create value is an ambition that is increasingly shared. However, organisations continue to face obstacles that Data Mesh could help to overcome… provided the...

Data Trends

Data Mesh: federated governance to guarantee efficiency

Data governance is an essential part of any data strategy. Nevertheless, it remains complex to deploy in a traditional organisation, but through its federated approach, Data Mesh is able to...

Data Trends

Data infrastructure self-service as the technological driving force behind Data Mesh

Data Mesh is not strictly speaking a technological approach, but data domains need powerful technical resources to develop their products. The data platform and its infrastructure are a facilitator for...

Data Trends

Data Mesh: data is a product

Oil, digital black gold, strategic asset… With Data Mesh, data is regarded as a product. Data domains are responsible for managing the life cycle of these products and for sharing...

Data Trends

Data domains: Data Mesh gives business domains superpowers

The Data Mesh concept is based on four main pillars, the first of which is an organisation divided into data domains. To be effective, this structure must reflect the business...

Understanding AI and Data Science

Can a whole Data Science project be done using R or Python?

For several years now, many Data Scientists have found themselves turning to "language" command line tools, such as R and Python, to deal with Big Data. But can you really undertake a whole Data...

Understanding AI and Data Science

Data Science: the 4 obstacles to overcome to ensure a successful project

The last five years we have seen the number of Data Science projects carried out by Business & Decision in various sectors, such as the oil industry, telephony, retail and...

Integrating AI and Data Science

Artificial intelligence, machine learning, data science: are these terms interchangeable?

Many writers talk about AI, machine learning and data science, as if these terms were broadly interchangeable. What’s going on exactly?

AI Industrialization: the key steps to a MLOps approach

Orchestrating the continuous interaction between data scientists and data engineers

MLOps: an AI life cycle continuous chain

1. A common platform

2. A feature store

3. Data labelling & model training

4. Model monitoring & retraining

MLOps advantages

Discover also

Data Governance and Data Management: what's the difference?

REPLAY | Let’s win the Data Mesh Battle: the winning alliance between Data Architecture and Data Governance

REPLAY | The missing pillars in the Data Mesh approach

WHITEPAPER | Spiderman guides you towards a data-driven company

Data Mesh, a total data-driven model

#Data #AI: 7 hot topics for 2023

Data Mesh: Practical examples and feedback

Data Mesh: federated governance to guarantee efficiency

Data infrastructure self-service as the technological driving force behind Data Mesh

Data Mesh: data is a product

Data domains: Data Mesh gives business domains superpowers

Can a whole Data Science project be done using R or Python?

Data Science: the 4 obstacles to overcome to ensure a successful project

Artificial intelligence, machine learning, data science: are these terms interchangeable?

Informations sur la gestion de vos données et vos droits