Data Mesh is not strictly speaking a technological approach, but data domains need powerful technical resources to develop their products. The data platform and its infrastructure are a facilitator for unifying initiatives and rationalizing the technologies used. This requires essential characteristics in terms of agility and automation for on-demand or self-service resource consumption. The Self-Service Data Infrastructure as a Platform is the third of the four pillars of Data Mesh.
1- Data Mesh: the ultimate model for data-driven companies?
2- Data domains: Data Mesh gives business domains superpowers.
3- Data Mesh: data is a product
4- Data infrastructure self-service as the technological driving force behind Data Mesh
5- Data Mesh: federated governance to guarantee efficiency
A platform that is federated, interoperable and provisionable on demand
The Self-Service Data Infrastructure as a Platform is the technical pillar of Data Mesh. It aims to equip the data domains and to host the data products developed by them. To enable the autonomy of the domains and the distribution of data, it is essential, in the context of Data Mesh, to provide a federated, interoperable platform or infrastructure whose resources are addressed in self-service mode.
As a technological pillar, the Self-Service Data Infrastructure as a Platform therefore aims to position IT as a facilitator of the Data Mesh approach. It also aims to prevent an explosion in the technologies used for data-related projects.
Data Mesh is nonetheless based on a consolidation of infrastructure and data services. The efforts made in recent years in terms of rationalisation should not risk being swept away by, for example, multiplying the number of data storage solutions within the organisation.
As Spiderman said: ‘With great power comes great responsibility’. More than ever, therefore, the IT Department safeguards the consistency of technological choices.
Data Mesh: federated infrastructure and data platform for greater efficiency
One of the key points of Data Mesh is that it is technology-agnostic. It can therefore be implemented with any type of tool or database. With an ETL tool or with specific development, with SQL or NoSQL databases, with traditional reporting or data visualisation tools or even with Data Science studios or programming languages… everything is possible! The key is that data exchange and access can be standardised across the organisation for effective interaction between domains.
Independently of the bricks from which it is built, the infrastructure must therefore be federated and available in the form of a platform whose services can be consumed on demand by the various domains.
The platform encompasses services, grouped together in an application catalogue, which can be activated by the domains according to their needs and those of their products. It also includes a set of infrastructure resources, which will be allocated, again according to the needs of the domains.
This vision of a unified platform has several advantages, notably in terms of rationalisation, but also in terms of supervision, operated from a single point and by individual teams. The data platform must also be managed as a shared resource, whose development is controlled and operated within the scope of a road map organised according to the needs of its users (the domains).
As Spiderman said: ‘With great power comes great responsibility’. Therefore, the IT Department safeguards the consistency of technological choices.
Data Mesh: reaffirming the IT department’s role in the data platform
Through its role in providing a catalogue of application and infrastructure services, the organisation’s IT Department consolidates its function, whilst also developing it. It is less involved in the ‘project’ aspect (the development of data products is the responsibility of the domains) and instead focuses on the deployment, maintenance and technical support of infrastructure and application services.
In this context, the mission of the IT Department is therefore crucial. Through its operations and its expertise in the underlying technologies, it enables and facilitates all data initiatives undertaken by the domains. And to this end, the IT Department defines the rules for the technological pillar.
In operational terms, IT is therefore involved in three areas:
- Infrastructure (resource provisioning capabilities, computing, storage, orchestration, etc.)
- Supervision of infrastructure and expenditure (via FinOps), and its governance.
- Development of the platform as a product for the benefit of the domains.
Which technologies for Data Mesh?
The IT Department organises the provision of self-service technical resources. This aspect can be confusing in the world of data. Self-service often applies to access to business data via data visualisation tools (dataviz). When applied to Data Mesh, self-service characterises the ability to provide and allocate the hardware and application resources of the data platform at the request of the data domains.
In addition, two technological approaches make it much easier to implement Data Mesh: the cloud and data virtualisation.
What these two technological trends have in common is that they allow for flexible allocation and strong control of hardware resources. They can also be based on different underlying technological building blocks while offering a high level of standardisation of data access. In addition, their elasticity and scalability will make it possible to manage the increases and decreases in load related to the evolution of data products.
Two technological approaches make it much easier to implement Data Mesh: the cloud and data virtualisation.
With this critical level of agility in mind, the DevOps and DataOps approaches are at the heart of the IT Department’s strategy. Data Mesh relies on modern data engineering practices such as continuous integration and deployment (CI/CD), without which it will be difficult to imagine a level of industrialisation at company level. For maximum flexibility, preference could also be given to Infrastructure as Code.
Let us remember that Data Mesh remains agnostic in terms of the technologies used. The concept does not favour the use of public cloud infrastructure over private or on-premise cloud infrastructure. It is therefore possible to understand it within the context of a hybrid IT architecture or from a traditional data warehouse, data lake or data hub of the ‘data-centric era’. However, this will have to be adapted to make it more agile and ensure standardised access to data.
Cloud and FinOps approach to Data Mesh
With the emergence of the cloud, the world of IT infrastructure has undergone a major technological revolution in recent years. Companies understand this and are investing heavily in agile environments to support their transformation efforts.
This trend is reflected in a constant increase in spending. The ‘Worldwide Quarterly Enterprise Infrastructure Tracker: Buyer and Cloud Deployment’ report estimates that spending on cloud infrastructure will exceed $90 billion globally by 2022. IDC Tracker also expects cloud infrastructure spending to displace on-premise infrastructure budgets.
There are a number of reasons why companies are adopting the cloud, and one of them is the challenge of data. The cloud is seen as a way to break down silos and accelerate projects involving the use of data and artificial intelligence. Thus, Cloud Data Platforms are now emerging as the new El Dorado for companies.
These platforms, but also the underlying infrastructure, are undoubtedly also an asset when implementing a Data Mesh approach. In the way they work, these technologies support the rise of the Self-Service Data Infrastructure as a Platform pillar.
However, the use of the cloud as part of a Data Mesh approach requires the application of good practices for monitoring expenses. Thus, in terms of cost management, the deployment of the platform must be consistent with the implementation of a FinOps approach. This is essential, as the consumption and therefore the associated expenditure are carried out in a distributed way, by the data domains. It is therefore vital to have the means to maintain a federated vision and ensure good control of overall expenditure.
As such, each domain can monitor its consumption and potentially integrate these costs (in euros and CO2 emissions) into its own budget. Expenditure can therefore be broken down to a precise degree to allow supervision by domain, by project, by application brick or by product.
💡Self-Service Data Infrastructure as a Platform: things to remember
📌Federated, interoperable and self-service infrastructure/platform
📌Application and infrastructure resources catalogue
📌Catalogue of APIs (standardisation of access to data products)
📌DevOps culture (already well) established
📌The IT Department guarantees the consistency of technological choices
📌Data Mesh is technology-agnostic
📌‘Obvious Data Mesh’ technologies: the cloud and data virtualisation
This article was written in collaboration with Christophe Auffray.