The Big Data blog series will explore the definitions of Big Data, business benefits, requirements & readiness, tools & technologies, and overarching success tips with risk mitigation.
Big data can deliver transformational benefits and enable enterprises to outperform their competitors by 20% in each financial metric. Sound attractive? Yet, ask someone close to you to define ‘Big Data’ and you will likely hear something about high volumes of data. This is true. You may also hear another speak eloquently about data science and advanced analytics opening doors to new areas and venues for innovation. This is also true. You might hear yet others speak of the tools, as if all you need is to buy a particular product and viola, you are in big data. It does not quite work that way.
This first part of the blog series will cover the overall definition while the next post in this series will delve into the business benefits and value proposition. Let us start digging into the definition.
Volume: people speak about big data via volumes of information. For instance, the IDG predicts the amount of data in the digital universe will exceed 40 zettabytes by 2020. In data management & analytics, we deal with gigabytes and terabytes on a daily basis, petabytes, less often and exabytes, rarely. Strictly speaking, volume is a part of the big data definition but not the full landscape.
Unstructured data: others speak of the different producers of data available such as social media, GPS, documents, RFID and so on. Gartner estimates the amount of unstructured data will increase to 650% of its current size by 2017. Again, a great number but is it relevant and meaningful to you? This brings to mind infinity discussions within mathematicians.
Advanced Analytics: other parties speak to trend analysis, statistics, prescriptive analytics such as text analytics, trend & statistical analysis and simulation modeling. This is an excellent segue into a big data discussion but you can’t possibly operate advanced analytics without first coming to terms with your data management challenges.
In-disk & In-memory: some parties, particularly vendors, start big data discussions by speaking of large scale, massive parallel processing systems. In my view, this is a tool. You cannot purchase a product and claim you are in ‘Big Data’.
Let’s attempt to put this all together.
In Big Data, we need to be able to pull a wider variety of data, both structured and unstructured, be able to integrate, govern and manage the data in a scalable, extensible solution that allows for quick retrieval and dynamic models, and enable use of sophisticated methods to perform predictive and prescriptive analytics for discovery of new and innovative ways to drive business and deliver transformational benefits.
It is a mouthful surely, but big data is not easy or everyone would be successful in employing these solutions. As a matter of fact, a recent Capgemini study found that 73% of firms surveyed did not consider their big data project successful and only 8% of all respondents claimed to be “fully satisfied”. We will cover reasons for this in the next blog in our series. For now, lets remain with the definition.
When considering the volume and variety arguments, my preference is to use the 80/20 rule to demonstrate the significance. Take into account that roughly 20% of the available digital universe for your organization is in your typical business intelligence environment and consists mainly of structured data. These are your operational systems, ERPs, call centers, business applications, data warehouses & marts, and so on. We can make an interesting correlation of the traditional systems and the iceberg. As in the iceberg only 20% is visible at the surface. If this visible cap is how you are making navigation decisions, you may be in danger of sinking your craft!
If most of our analytics and decision support services today are determined by only 20% of the available data, what would be the result of incorporating the 80% of unstructured content? How could this benefit analytics, and what do we current utilize to interpret and intelligently use the other types of data?
So, to follow this line of reasoning, in our traditional decision making, business intelligence systems, there are myriad systems and producers of data that should be in our consideration when making business decisions or analysis. These producers of information can be both internal and external to our organization. Unstructured data sources may include: email, instant messaging, social media, Satellite, Regulatory and Court documentation, logs, video, audio, blogs such as this and of course enterprise content management systems as a few examples. By incorporating and integrating these sources, we expand our powers of reasoning and analytics and enable innovative way to query and interpret the data.
Defining Big Data
In summary, the definition of big data must cover a wider variety of data producers, a mechanism to link, consolidate and manage the data as well as the processes to convert data into actionable information to supply value to the business. In the next segment of this series, we will delve into the benefits and value add to the business that a successful big data engagement can bring to the organization.