In the wake of Big Data, many companies embarked on the Data Science journey, the field having established itself as the inescapable route towards Big Data transformation into knowledge and actions. Discover in this blog article the 5 key practices to observe in order to ensure project success.
Data Science methodology is essentially agile and iterative. It derives from inductive reasoning, which consists in exploiting data to build knowledge. The approach is step-based: assumptions are first made and then, validated using statistical and/or machine learning algorithms.
As a general rule, the method used is the 6-step CRISP-DM method. Following the business and data understanding phases, data is prepared and recoded for modelling. The model is then evaluated before, finally, being automated.
Repeating the process, even several times, may be necessary, before actually implementing and deploying the model.
2. Engaging with the business
The modelling process starts and ends with the business. Initially, the project must have a general objective such as “better understand my points of sale key success factors”. If you do not understand a business function, then you cannot model it.
Business departments must therefore devote some time to helping Data Scientists better grasp the business issues suggested by the data. Similarly, Data Science teams have to take the time to communicate results obtained to business departments, using Business Intelligence tools such as DataViz and DataStorytelling.
3. Data quality
Data is, of course, at the very heart of the Data Science process. High quality and well-documented data is a prerequisite for exploitable results.
Whilst the volume of data is not a serious cause for concern, data quality and depth are, for their part, key factors in Data Science initiatives. Special effort will be made to track missing, false or inconsistent data, and particular attention will be paid to unusual observations and outliers.
4. Human and organisational challenges
In Data Science, people from various backgrounds, such as statistics, machine learning, business, BI, programming and databases, must collaborate. This diversity usually poses a major challenge when setting up a Datalab.
5. Technical challenges
Finally, the 5th practice concerns the technical challenges. Indeed, Data Science is by no means a new science. Rooted in the spheres of statistics and machine learning, it had to adapt to Big Data. An adaptation that has, almost totally, changed the way Data Science projects are managed and carried out.
The emergence of new Open Source tools and languages has also caused a major paradigm shift. Gone are the days when the practice of Data Science involved only one tool. Data Scientists now use several tools and languages (such as the R or Python language) to see their projects through.
A word of caution here: since there is a compatibility matrix for the various Big Data infrastructures and Data Science tools and languages, it is strongly recommended that you carefully define your Data Science environment before selecting a Big Data architecture, if possible.
To conclude, as it has been found that Data Science projects tend to “unsilo” companies, which makes them cross-functional projects par excellence, they should ideally, be visible at executive management level.