Cloud, Data and AI: the ultimate buzzword trio… Companies’ expectations in these fields today run extremely high due to their promising transformation and value creation potential. Data volumes are skyrocketing, new disruptive technologies for IT Departments are driving AI, and the cloud is offering the means to manage all the complexity with agility! When the time comes to design your own Data platform (one able to support all of your AI initiatives: not only those you are thinking about now, but most importantly those you have not even dreamed up yet), the decision of whether to build it in the cloud or not is often nothing short of a Cornelian dilemma. From high-performance/low cost promises, cybersecurity risks to regulatory conundrum, this article takes a closer look at cloud-based Data and AI platform solutions.
For some years now, the cloud has been a key area of concern for IT Departments. Widely used in some application sectors such as CRM (namely due to Salesforce’s appeal) or core business peripheral functions, Data projects had been until then mainly based on what is referred to as “on-premise” architectures (data stored on company servers and hence located in organisations’ datacentres).
The number one constraint associated with the cloud is usually technical and concerns the volume of data managed in data information systems. ISs tend to be data-intensive (multiple terabytes TB, even petabytes PB of data) and traditionally operate in batch mode, i.e. not in real time (data is processed in batches, meaning significant volumes during each task). This constraint is however becoming less valid with increasing network capacity and the new real-time processing capabilities of new bases and data management tools.
In fact, the real hindrance to cloud adoption today finds its roots in our culture. Allowing one’s data (i.e. one’s war chest, potentially all of the company’s knowledge) to be hosted by a service provider still seems like a huge leap to take for many organisations… And entrusting this data to a public cloud player like Amazon AWS, Microsoft Azure or Google Cloud Platform is yet another ball game.
Beyond the legitimate (and smart) concerns regarding the burning issue of data security (which we will get back to), the debate is increasingly taking on the characteristics of bar-room philosophy rather than reasoned analysis.
The cloud’s siren-like lure for Data and AI projects
Admittedly, the advantages of the cloud are numerous and extremely tempting, especially with regards to Data projects:
- Cost: usage-related invoicing and reduction of TCO (Total Cost of Ownership), namely through a decrease in architecture management costs
- Infrastructure: robustness, elasticity, scalability, container management
- Methodology: super fast project launch and agile solutions. Scalability.
- Applications: choice from a wide range of open solutions (marketplace system) or proprietary options tied to the Cloud operator
Moreover, resorting to Artificial Intelligence frameworks available in the cloud seems set to become common practice. Indeed, it is hard to deny that the algorithms of the likes of Google, Facebook, IBM and Microsoft, pre-trained on millions of user interactions and images, are the most powerful and quick to implement.
The cloud thus seems to be the El Dorado of Data and Artificial Intelligence projects. Both a catalyst for innovation and a scaling up support, it has been the springboard for the creation of numerous start-ups and their transformation into unicorns, (Netflix, Blablacar and N26 to name but a few – N26 being one of the 100% digital troublemakers of the banking sector).
Numerous benefits for AI projects
The cloud offers many advantages for AI projects by meeting demands that are specific to the field:
- Management of huge volumes of data > ability to operate large and efficient infrastructures thanks to the separation of storage and compute units
- Mobilisation of substantial compute resources for a limited period of time (during the learning phase for example) > elasticity and usage-based pricing (one pays only for the compute units used during the required time). Ability to increase compute power through the use of GPUs (extremely important for some Artificial Intelligence applications such as computer vision applications)
- Management of unstructured data (text, images, video, sound) > dedicated application solutions integrated into cloud services
- Use of specialised algorithms > service calls to the cloud operator’s or other providers’ pre-trained algorithms (interoperability and open services)
- Agile methodology based on iterative development and scaling up > scalability and devops
Cloud-based Data and AI platforms: data security and protection are non-negotiable
But let’s not get carried away and succumb to the technological thrills by blindly following the siren song of cloud operators without giving the whole affair a second thought. When shifting data storage to parties that are external to the company, data security and protection must be given serious consideration.
In our virtualized world, the geographic location of data is a significant matter
First off, if the data stored is the slightest bit sensitive, you must ensure that the datacentre hosting it is located in Europe. This seems obvious where personal data is concerned since the GDPR entered into force, but should also apply to all other types of critical data if we are to ensure proper protection. The sovereign cloud subject is one that regularly finds its way to the political agenda and this could lead to a settlement of the issue by the governments.
GDPR vs. the Cloud Act, the regulations diplomatic struggle
On the regulatory front, an up until now, unseen geostrategic sparring match is currently taking place. Whilst Europe is wallowing in the GDPR’s cocoon of protection, the USA has decided to force through the adoption of a controversial text named the Cloud Act (Clarifying Lawful Overseas Use of Data). As if thumbing its nose at Europe, the American text was enacted on 23 March 2018, i.e. almost exactly two months before the GDPR came into effect, calling into question, in passing, the sacred data sovereignty principle.
In concrete terms, the text authorises American law enforcement officials to access data stored on American providers’ servers, regardless of the country in which they are located. Meaning that the U.S. police could (albeit only after issuing a warrant or summons, so in the course of strict legal proceedings) access data stored in the clouds of Microsoft, Amazon, Google, Oracle or even IBM without having to worry about complying with local regulations or notifying those concerned. This creates an unprecedented diplomatic context and international discussions on the matter seem to be at a stalemate.
This situation should also be taken into account when selecting a cloud service provider to host your data. If strategic sensitivity is crucial to your organisation (as is the case for public players and in highly regulated sectors like banking or insurance) you may prefer to opt for a national or European cloud provider. Let us hope however that international discussions resume soon and that an agreement is reached between the United States and Europe. In any event, an assessment of the risk involved and legal advice could prove helpful at the time of contract signing with a foreign cloud operator.
Reversibility and Cloud Security: trust should never preclude caution
If you are thinking about building your Data and Artificial Intelligence architecture in the cloud, your first reflex should be to make sure you have an exit strategy! This may seem surprising, but the subject of reversibility must be addressed right from the start. In addition to anticipating on the last resort measure to take in case there is a problem with the cloud service or it is unsatisfactory, the reversibility study will help you ask all the right questions and, ultimately, better exploit the cloud environment and solutions. Reversibility studies are all the more important for Data and AI projects because they include elements that are at the heart of the organisation’s operation and must therefore be kept under total control.
Another key point to consider is that you should never take data security lightly (ever!) and the subject must be addressed at the time of deploying your data platform (whether it be in the cloud, or not, in fact). The most sensitive data (which implies that legacy data must be mapped and classified beforehand…) must, at the very least be encrypted. Cautious organizations can set up hybrid architectures, either to distribute data among several clouds, or distribute data between the cloud and local storage in the company’s data centers:
However, security fears associated with data storage in the cloud are, in my opinion, merely cultural and are bound to disappear in the coming years. After all, there was a time when people were convinced that the safest place to keep their money was under their mattress. While really, the safest place to keep money will always be in a bank’s safe-deposit box even if the latter stores large amounts of it (therefore without a doubt stirring up more greed). The same applies to data security. Cloud players devote considerable resources to ensuring the highest level of security. Resources that traditional companies would normally be unable to mobilise on their own.
Banks: the last bastion against data in the cloud?
Before wrapping up, I suggest we zoom in on a sector that Business & Decision is quite familiar with: banking. The sector is quite unique in that it handles considerable data volumes, data being in fact the cornerstone of the business. Moreover, banks closely monitor technological developments (the very ground on which Fintechs have started challenging traditional banks) and have within their ranks armadas of IT engineers to do so.
The cloud (and in particular Data ISs and AI projects) poses a real dilemma for the sector which keeps conducting studies without really being able to take a clear stance in its favour. Several attempts have been made by the likes of Société Générale and Crédit Agricole in France, but the sector’s efforts remain generally feeble regarding these new architectures for large-scale projects.
A heavily regulated sector
It must be said that banks are strictly regulated and that several significant texts have been published on the subject. I recommend two in particular:
- Les risques associés au Cloud Computing (Risks associated with Cloud Computing) by the ACPR (Prudential Control and resolution Authority), July 2013 (French text)
- Recommendations regarding the use of Cloud computing by the EBA (European Banking Authority), December 2017 (English text)
In the latter text, the EBA provides a list of elements to consider when deploying banking solutions in the cloud, namely:
- Systems auditability
- An up-to-date register describing data stored in the cloud in detail
- Information from supervisors on data stored in the cloud
- Locating data in the country in which it was collected
- Data security
- Ability to recover or transfer data at any time in case of cloud supplier default
The text, which entered into force on 1 July 2018, lays the foundation of the precautions to take for cloud-based Data and AI projects by adding new terms to the ones already in effect in all other sectors (namely through the GDPR).
Data and AI platform: to cloud or not to cloud?
In conclusion, it would seem that the advantages of the cloud for Data platforms are indisputable and that the cloud’s intrinsic qualities are precious for Artificial Intelligence projects. Moreover, the belief that the cloud is less secure than traditional infrastructure is merely cultural.
However, precautions should be taken when transferring your Data and AI platform to a cloud service:
- Locating data in Europe (maybe even in Belgium, Switzerland, Netherlands,…) depending on sensitivity level
- Studying regulatory, legal and contractual implications very seriously
- Planning for reversibility right from the start of the project
- Keeping a very close eye on the security of data stored and being moved around
And there you have it, an excellent recipe for building a sustainable and robust Data architecture, able to support all of your organisation’s AI initiatives!