This “seemingly” old debate deserves to be revisited with fresh perspective. Data Science (such as Big Data) is a constantly evolving field with nowadays proven applications namely in the fields of customer knowledge and marketing…
Statistics and machine learning in the era of Data Science and customer knowledge
Even though the field of application is fairly recent, the basic methods used in Data Science are for the most part some forty years old now. To recall, the two main branches concerned are statistics on the one hand, and machine learning on the other, to which I would add a third branch that consists of what could be called “business ontologies” i.e. “structured sets of terms and concepts representing business know-how or a field of application” (Wikipedia). These ontologies help break down business know-how into two main areas:
- One grouping the data dictionary and concepts specific to the business
- Another one focused on capitalising on the processes and operating procedures specific to this business
We can notice that some people are absorbed by the versus debate, comparing the statistical and machine learning approaches, and their efficiency, ROI and cost within the context of predictive applications (predictive marketing, Digital Marketing, customer knowledge, etc.).
What sparked the debate?
This debate is by no means a new one in the sense that the two “schools” sprang from two different intellectual trends. “Machine learning”, also sometimes referred to as “artificial intelligence” is based on the premise that the ever-growing computational power of computers can help model specific phenomena. As for statistics, it is a specialised branch of mathematics that exists, at least theoretically, independently of computers.
The origins of statistics can be traced back to the reign of Louis XIV, who wanted a record of the various trades existing at the time in France (the term “statistics” actually contains the root of the word “état” (state), or state science). Afterwards, it itself split into several schools, namely the French, the Anglo-Saxon and Russian schools.
Today, in the wake of a remarkable evolution, all three schools have started more or less converging, with all three benefiting from the exponential growth of computers predicted by Moore’s famous law regarding their application in the form of increasingly powerful programmed algorithms.
Without being chauvinistic, the French school (sometimes called “statistique à la française” (French-style statistics)) remains amongst the most advanced in the world, at least in the academic sphere.
A little theory
The fact that we can use algorithms to predict phenomena, such as the behaviour of a customer group, remains mind-boggling and quite mysterious for many. But it is not as mysterious as it seems. In fact, all you need is a set of variables that characterise a given phenomenon in a number of actual observations, as well as a variable that describes each time the result in the form of a logical, categorical or numeric value. The objective is then to establish a link (or a model) between the output variable (or a variable to be predicted) and input variables (or predictive variables).
If we simplify the task totally, the operation uses statistics and/or Machine Learning only if the result of the variable to be predicted for a limited number of observations or cases, called “learning sample,” is known. Analysing the adjustment of the obtained model to observation data helps us to assess the accuracy of the model with respect to this learning sample.
The next step consists in validating the predictive model obtained using another “test” sample. This helps validate the robustness (reliability) of the model generated from the learning sample.
Naturally this implies to have a fairly high quality data, an IT infrastructure that can support data processing, a software tool (statistics and/or machine learning focussed) and of course a key stakeholder, in general known as a “Data Scientist”, who will implement an approach (of CRISP-DM-type, Cross Industry Standard Process for Data Mining) that provides a logical framework for the project.
The siren song often heard on the market seems to suggest that machine learning solutions can almost single-handedly do the job, without the intervention of a specialist to configure them and in addition with much better results than using the approach described in the previous paragraph.
The fact is that, today, there are almost as many machine learning methods as statistical methods available. However, experience consistently shows that the best results are obtained when both approaches are combined. The debate opposing the two approaches is thus a relatively empty one. In fact, statistics and machine learning are complementary.
We can understand this point if it is made clear that any predictive approach (predicting a future state based on a present state) requires a prior explanatory step (explaining a present state using a past state), which in turn requires a prior descriptive step (explaining the relationships and correlations between the various variables), and maybe even the implementation of an ontology for the concerned business.
Today, statistics (whether combined with ontology or not) can definitely give true descriptive or explanatory “business” meaning to data.
So, to oppose or not to oppose?
Can we, in absolute terms, dispense with these descriptive and explanatory steps (i.e. the statistics, maybe even ontology, part) and directly apply machine learning to data to predict a phenomenon?
Even if it is possible in theory and from an IT perspective. I would not recommend it. Indeed, the ease of use of these methods, often discussed by the “All Machine Leaning”, champions could make you think that non-statisticians are perfectly capable of using them. This is not the case.
The robustness and accuracy of a purely “machine learning” model does not ensure that it makes sense from a business point of view (only statistics does that).
And even if the initial results of these automatic methods were flawless from a business point of view, a non-specialist user would not necessarily be able to assess the model’s degradation over time due to the arrival of new customer populations or the need to integrate new observations.
Machine learning groups a set of key algorithms that generates good results in terms of campaign targeting, customisation etc. But these results are better, more reliable and accurate, if machine learning is based on intermediate statistical results such as typologies, propensity scores, etc. that have been obtained in a professional manner.
In summary, machine learning and statistics are not, so to speak, rival methods but well and truly complementary ones. The best marketing and customer knowledge (CRM) results are often obtained when the two are combined…