Home > Data & AI culture > Data Strategy > Data and AI: should you build your own company platform in the cloud?

Data Strategy

Data and AI: should you build your own company platform in the cloud?

22 October 2020 Updated at 15 May 2023

Cloud, Data and AI: the ultimate buzzword trio… Companies’ expectations in these fields today run extremely high due to their promising transformation and value creation potential. Data volumes are skyrocketing, new disruptive technologies for IT Departments are driving AI, and the cloud is offering the means to manage all the complexity with agility! When the time comes to design your own Data platform (one able to support all of your AI initiatives: not only those you are thinking about now, but most importantly those you have not even dreamed up yet), the decision of whether to build it in the cloud or not is often nothing short of a Cornelian dilemma. From high-performance/low cost promises, cybersecurity risks to regulatory conundrum, this article takes a closer look at cloud-based Data and AI platform solutions.

Data and AI: should you build your own company platform in the cloud?

For some years now, the cloud has been a key area of concern for IT Departments. Widely used in some application sectors such as CRM (namely due to Salesforce’s appeal) or core business peripheral functions, Data projects had been until then mainly based on what is referred to as “on-premise” architectures (data stored on company servers and hence located in organisations’ datacentres).

The number one constraint associated with the cloud is usually technical and concerns the volume of data managed in data information systems. ISs tend to be data-intensive (multiple terabytes TB, even petabytes PB of data) and traditionally operate in batch mode, i.e. not in real time (data is processed in batches, meaning significant volumes during each task). This constraint is however becoming less valid with increasing network capacity and the new real-time processing capabilities of new bases and data management tools.

In fact, the real hindrance to cloud adoption today finds its roots in our culture. Allowing one’s data (i.e. one’s war chest, potentially all of the company’s knowledge) to be hosted by a service provider still seems like a huge leap to take for many organisations… And entrusting this data to a public cloud player like Amazon AWS, Microsoft Azure or Google Cloud Platform is yet another ball game.

Beyond the legitimate (and smart) concerns regarding the burning issue of data security (which we will get back to), the debate is increasingly taking on the characteristics of bar-room philosophy rather than reasoned analysis.

The cloud’s siren-like lure for Data and AI projects

Admittedly, the advantages of the cloud are numerous and extremely tempting, especially with regards to Data projects:

Cost: usage-related invoicing and reduction of TCO (Total Cost of Ownership), namely through a decrease in architecture management costs
Infrastructure: robustness, elasticity, scalability, container management
Methodology: super fast project launch and agile solutions. Scalability.
Applications: choice from a wide range of open solutions (marketplace system) or proprietary options tied to the Cloud operator

Moreover, resorting to Artificial Intelligence frameworks available in the cloud seems set to become common practice. Indeed, it is hard to deny that the algorithms of the likes of Google, Facebook, IBM and Microsoft, pre-trained on millions of user interactions and images, are the most powerful and quick to implement.

The cloud thus seems to be the El Dorado of Data and Artificial Intelligence projects. Both a catalyst for innovation and a scaling up support, it has been the springboard for the creation of numerous start-ups and their transformation into unicorns, (Netflix, Blablacar and N26 to name but a few – N26 being one of the 100% digital troublemakers of the banking sector).

Numerous benefits for AI projects

The cloud offers many advantages for AI projects by meeting demands that are specific to the field:

Management of huge volumes of data > ability to operate large and efficient infrastructures thanks to the separation of storage and compute units
Mobilisation of substantial compute resources for a limited period of time (during the learning phase for example) > elasticity and usage-based pricing (one pays only for the compute units used during the required time). Ability to increase compute power through the use of GPUs (extremely important for some Artificial Intelligence applications such as computer vision applications)
Management of unstructured data (text, images, video, sound) > dedicated application solutions integrated into cloud services
Use of specialised algorithms > service calls to the cloud operator’s or other providers’ pre-trained algorithms (interoperability and open services)
Agile methodology based on iterative development and scaling up > scalability and devops

Cloud-based Data and AI platforms: data security and protection are non-negotiable

But let’s not get carried away and succumb to the technological thrills by blindly following the siren song of cloud operators without giving the whole affair a second thought. When shifting data storage to parties that are external to the company, data security and protection must be given serious consideration.

In our virtualized world, the geographic location of data is a significant matter

First off, if the data stored is the slightest bit sensitive, you must ensure that the datacentre hosting it is located in Europe. This seems obvious where personal data is concerned since the GDPR entered into force, but should also apply to all other types of critical data if we are to ensure proper protection. The sovereign cloud subject is one that regularly finds its way to the political agenda and this could lead to a settlement of the issue by the governments.

GDPR vs. the Cloud Act, the regulations diplomatic struggle

On the regulatory front, an up until now, unseen geostrategic sparring match is currently taking place. Whilst Europe is wallowing in the GDPR’s cocoon of protection, the USA has decided to force through the adoption of a controversial text named the Cloud Act (Clarifying Lawful Overseas Use of Data). As if thumbing its nose at Europe, the American text was enacted on 23 March 2018, i.e. almost exactly two months before the GDPR came into effect, calling into question, in passing, the sacred data sovereignty principle.

In concrete terms, the text authorises American law enforcement officials to access data stored on American providers’ servers, regardless of the country in which they are located. Meaning that the U.S. police could (albeit only after issuing a warrant or summons, so in the course of strict legal proceedings) access data stored in the clouds of Microsoft, Amazon, Google, Oracle or even IBM without having to worry about complying with local regulations or notifying those concerned. This creates an unprecedented diplomatic context and international discussions on the matter seem to be at a stalemate.

This situation should also be taken into account when selecting a cloud service provider to host your data. If strategic sensitivity is crucial to your organisation (as is the case for public players and in highly regulated sectors like banking or insurance) you may prefer to opt for a national or European cloud provider. Let us hope however that international discussions resume soon and that an agreement is reached between the United States and Europe. In any event, an assessment of the risk involved and legal advice could prove helpful at the time of contract signing with a foreign cloud operator.

Reversibility and Cloud Security: trust should never preclude caution

If you are thinking about building your Data and Artificial Intelligence architecture in the cloud, your first reflex should be to make sure you have an exit strategy! This may seem surprising, but the subject of reversibility must be addressed right from the start. In addition to anticipating on the last resort measure to take in case there is a problem with the cloud service or it is unsatisfactory, the reversibility study will help you ask all the right questions and, ultimately, better exploit the cloud environment and solutions. Reversibility studies are all the more important for Data and AI projects because they include elements that are at the heart of the organisation’s operation and must therefore be kept under total control.

Another key point to consider is that you should never take data security lightly (ever!) and the subject must be addressed at the time of deploying your data platform (whether it be in the cloud, or not, in fact). The most sensitive data (which implies that legacy data must be mapped and classified beforehand…) must, at the very least be encrypted. Cautious organizations can set up hybrid architectures, either to distribute data among several clouds, or distribute data between the cloud and local storage in the company’s data centers:

Hybrid architecture and Personal data protection

However, security fears associated with data storage in the cloud are, in my opinion, merely cultural and are bound to disappear in the coming years. After all, there was a time when people were convinced that the safest place to keep their money was under their mattress. While really, the safest place to keep money will always be in a bank’s safe-deposit box even if the latter stores large amounts of it (therefore without a doubt stirring up more greed). The same applies to data security. Cloud players devote considerable resources to ensuring the highest level of security. Resources that traditional companies would normally be unable to mobilise on their own.

Banks: the last bastion against data in the cloud?

Before wrapping up, I suggest we zoom in on a sector that Business & Decision is quite familiar with: banking. The sector is quite unique in that it handles considerable data volumes, data being in fact the cornerstone of the business. Moreover, banks closely monitor technological developments (the very ground on which Fintechs have started challenging traditional banks) and have within their ranks armadas of IT engineers to do so.

The cloud (and in particular Data ISs and AI projects) poses a real dilemma for the sector which keeps conducting studies without really being able to take a clear stance in its favour. Several attempts have been made by the likes of Société Générale and Crédit Agricole in France, but the sector’s efforts remain generally feeble regarding these new architectures for large-scale projects.

A heavily regulated sector

It must be said that banks are strictly regulated and that several significant texts have been published on the subject. I recommend two in particular:

Les risques associés au Cloud Computing (Risks associated with Cloud Computing) by the ACPR (Prudential Control and resolution Authority), July 2013 (French text)
Recommendations regarding the use of Cloud computing by the EBA (European Banking Authority), December 2017 (English text)

In the latter text, the EBA provides a list of elements to consider when deploying banking solutions in the cloud, namely:

Systems auditability
An up-to-date register describing data stored in the cloud in detail
Information from supervisors on data stored in the cloud
Locating data in the country in which it was collected
Data security
Ability to recover or transfer data at any time in case of cloud supplier default

The text, which entered into force on 1 July 2018, lays the foundation of the precautions to take for cloud-based Data and AI projects by adding new terms to the ones already in effect in all other sectors (namely through the GDPR).

Data and AI platform: to cloud or not to cloud?

In conclusion, it would seem that the advantages of the cloud for Data platforms are indisputable and that the cloud’s intrinsic qualities are precious for Artificial Intelligence projects. Moreover, the belief that the cloud is less secure than traditional infrastructure is merely cultural.

However, precautions should be taken when transferring your Data and AI platform to a cloud service:

Locating data in Europe (maybe even in Belgium, Switzerland, Netherlands,…) depending on sensitivity level
Studying regulatory, legal and contractual implications very seriously
Planning for reversibility right from the start of the project
Keeping a very close eye on the security of data stored and being moved around

And there you have it, an excellent recipe for building a sustainable and robust Data architecture, able to support all of your organisation’s AI initiatives!

Business Innovation Directeur Business & Decision

Data Maniac!! 20 years of experience in enterprise data capital valorisation at Business & Decision. Engaged person, Mick advise many organisations on their Data strategy and on adoption of new digital usages

Learn more >

Comment (1)

Your email address is only used by Business & Decision, the controller, to process your request and to send any Business & Decision communication related to your request only. Learn more about managing your data and your rights.

Lenny Jacobs Le 23 February 2021 à 6h50

Thank you for this very nice, detailed article. I've been reading a bit on data proiders on Data Hunters and, in fact, just the other day read about the need to build a competent data infrastructure in your company here https://www.data-hunters.com/a-data-driven-culture-requires-the-right-data-infrastructure/ . There is a lot to consider there and in your article before we go forward.

Data Strategy

Data Governance and Data Management: what's the difference?

In a world where companies' ambition is to be data-driven, data governance and data management are still too often regarded as being synonymous. Let us clear up the confusion. Data...

Premium

Data Governance

REPLAY | Let’s win the Data Mesh Battle: the winning alliance between Data Architecture and Data Governance

The Data Mesh vision has brought to light the various challenges that companies face in managing and effectively utilizing their data. This is not a new challenge, as it has...

Premium

Data Trends

REPLAY | The missing pillars in the Data Mesh approach

Is Data Mesh a utopia? For two years now, the concept of Data Mesh has been seen as a revolution in the world of data since it would fill the...

Premium

Data Strategy

WHITEPAPER | Spiderman guides you towards a data-driven company

There is tremendous enthusiasm for Data Mesh. And for good reason: we finally have a complete framework for valuing data at company level. This white paper offers you a deep...

Data Trends

Data Mesh, a total data-driven model

Through its four main pillars, Data Mesh truly moves away from the dogma of centralisation and all-technology in favor of a global approach based on federation. Data Mesh thus promises...

Data Trends

#Data #AI: 7 hot topics for 2023

The 7 hot topics Data and AI of this 7th edition are the solutions for the performing company. What are specifically the trends and topics to track in 2023? This...

Data Trends

Data Mesh: Practical examples and feedback

Mastering data and its uses to create value is an ambition that is increasingly shared. However, organisations continue to face obstacles that Data Mesh could help to overcome… provided the...

Data Trends

Data Mesh: federated governance to guarantee efficiency

Data governance is an essential part of any data strategy. Nevertheless, it remains complex to deploy in a traditional organisation, but through its federated approach, Data Mesh is able to...

Data Trends

Data infrastructure self-service as the technological driving force behind Data Mesh

Data Mesh is not strictly speaking a technological approach, but data domains need powerful technical resources to develop their products. The data platform and its infrastructure are a facilitator for...

Data Trends

Data Mesh: data is a product

Oil, digital black gold, strategic asset… With Data Mesh, data is regarded as a product. Data domains are responsible for managing the life cycle of these products and for sharing...

Data Trends

Data domains: Data Mesh gives business domains superpowers

The Data Mesh concept is based on four main pillars, the first of which is an organisation divided into data domains. To be effective, this structure must reflect the business...

Integrating AI and Data Science

Data Science and AI: how to properly scope your business projects?

An increasing number of companies are opting for data-driven strategies and embarking on marathon Data Science and Artificial Intelligence projects, in the hope of sharing the benefits of new technologies...

Business Marketing

What powerful marketing tools to choose in times of crisis?

Studies have shown that, even in times of crisis, such as the one we are currently living with the new Coronavirus outbreak, customers are still open to try new products,...

Integrating AI and Data Science

How is the Port of Antwerp optimising logistics with data science?

Looking for fast, intelligent exploitation of its mass of data, the Port of Antwerp turned to Business & Decision to optimise and secure the safety and efficiency of its maritime...

Data and AI: should you build your own company platform in the cloud?

The cloud’s siren-like lure for Data and AI projects

Numerous benefits for AI projects

Cloud-based Data and AI platforms: data security and protection are non-negotiable

In our virtualized world, the geographic location of data is a significant matter

GDPR vs. the Cloud Act, the regulations diplomatic struggle

Reversibility and Cloud Security: trust should never preclude caution

Banks: the last bastion against data in the cloud?

A heavily regulated sector

Data and AI platform: to cloud or not to cloud?

Discover also

Data Governance and Data Management: what's the difference?

REPLAY | Let’s win the Data Mesh Battle: the winning alliance between Data Architecture and Data Governance

REPLAY | The missing pillars in the Data Mesh approach

WHITEPAPER | Spiderman guides you towards a data-driven company

Data Mesh, a total data-driven model

#Data #AI: 7 hot topics for 2023

Data Mesh: Practical examples and feedback

Data Mesh: federated governance to guarantee efficiency

Data infrastructure self-service as the technological driving force behind Data Mesh

Data Mesh: data is a product

Data domains: Data Mesh gives business domains superpowers

Data Science and AI: how to properly scope your business projects?

What powerful marketing tools to choose in times of crisis?

How is the Port of Antwerp optimising logistics with data science?

Informations sur la gestion de vos données et vos droits