The last five years we have seen the number of Data Science projects carried out by Business & Decision in various sectors, such as the oil industry, telephony, retail and services, rise significantly. However, some difficulties must be overcome in order to efficiently implement these types of projects. Explanation.
First of all, let us not forget that Data Science draws on several subject areas in which expertise is essential in order to ensure smooth and successful execution of such projects:
- Data preparation, which refers to the process of grouping all data in one same location, recoding it and preparing it for formatting and making it exploitable.
- Statistics, the basic principles of which must be fully understood in order to handle data correctly.
- Machine Learning, the indispensable tool to manage big, evolutive, streaming and even incomplete data.
- AI, which makes intensive learning and automation possible.
Didier Gaultier, Data Science & AI Director (Business & Decision), identifies four major obstacles often encountered during the implementation of Data Science projects, with some practical steps to take to overcome them.
1. The “siloed” data challenge
Very often today, companies’ data are “siloed”: each business function having its own information system (IS). Since data is the very cornerstone of the project, it is vital that companies adopt a Data-Centric approach by:
- Placing data at the heart of the IS: building a datalake/datahub
- Having a dedicated team
- Setting up data governance
2. Prerequisites and project organisation
Before deciding on the project scope and launching a pilot, two key prerequisites must be met.
Understanding business challenges
Gaining a sound understanding of the business and its challenges is imperative. This is key to ensuring the initiative’s success and its adoption by internal teams. All Data Science projects must therefore be introduced to business teams through workshops.
IS data and architecture diagnosis
In order to identify the opportunities and constraints associated with data, it is better to organise “data” workshops with internal teams and the IT Department. These will, amongst other things, help prepare for any constraints that may surface during deployment: choice of architecture, tools and even programming language.
3. Management of complex algorithms
Effective management of complex algorithms is necessary in order to properly address the bias-variance trade-off problem associated with training data. However, in some sectors, constraints apply. For instance, in the banking sector, algorithms are limited by a traceability obligation.
4. The challenge of putting a model into production
During the deployment phase, the model is put into production. However, this can prove difficult to do, namely in the following cases:
- Data has not been “un-siloed”
- The chosen programming language is not suitable for deployment (opt for Python and not R for example)
- Maintenance tools are not adequate, even though specialised tools exist (Dataiku, Knime, Azure Machine Learning, SAS)
4 Data Science project examples
At Business & Decision, experts rely on three Data Science pillars, namely: “explain, predict and recommend” to help customers make the most of their data. Today, Data Science can be applied to any field. Amongst projects undertaken by the company:
- Oil industry: development of a consumption, extraction level and crude oil refining capacity predictive analysis platform for a player in the oil sector
- Telephony: improvement of a telecommunications company’s customer service level through smart management of support tickets by a “bot”
- Retail: set-up of a customer “anti-churn” (or retention) system for a French electrical products distributor
- Services: enhancement of La Poste Group’s mail delivery system, using a dynamic round definition algorithm for postmen, based on delivery-to-address predictions. This project actually led to the creation of new services: “Expédition en boite aux lettres” (sending a parcel form your mailbox) and “Veiller sur mes parents” (look after my parents).
This article was written by Mathieu Bruniquel, a 2019 graduate in Master of Science – MS Big Data from Télécom ParisTech. It follows a talk given by Didier Gaultier to share his vision of the Data Scientist/Engineer profession as well as his field experience with Télécom ParisTech’ MS Big Data students.