Almost any definition of big data is based on Doug Laney’s original preposition of the 3 V’s –
Velocity (the data is growing rapidly),
Variety( the data comes from many sources – both structured and unstructured)
and Volume (the data is big).
For example, Gartner defines big data as “high volume, high velocity, and/or high variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization.”.
There are two principle challenges with the 3 V’s definition for big data: [Tweet This]
1. It is focused on the technical characteristics of big data – useful as a starting point but not necessarily good for the maturing market. Big data must be understood as a business concept rather than as a technical concept if it is to add value in any organisation.
2. The “and/or” nature of the definition (highlighted in bold above) is the cause of additional confusion. Any vendor with a solution that handles any one the 3 V’s – for example, an existing database that can handle large volumes – is positioned as being a big data solution. Many vendors will focus on the one V that suits them and ignore the “that requires new forms of processing” part of this definition. In almost every case, big data must have at least two of these characteristics.
Additional V’s have been added to continue the V theme, whilst improving the underlying definition.
Most commonly these include Veracity (the data must be accurate) and Value (the data must be of business significance). The addition of these V’s touches on some of the data governance aspects of dealing with big data but does not really address the challenges inherent in the initial definition.
In order to move forward in a maturing market is it time to forget about the three V’s?
One approach is to look at what big data is not. Big data is not BI! [Tweet This]
BIg data solutions provide insight by inferring laws discovered by analyzing large sets of data with low information density.
BI, in contrast, is about analyzing data with high information density to discover trends, measure compliance to known rules, etc.
Newer definitions focus on these differences, for example, Wikipedia describes big data as “an all-encompassing term for any collection of data sets so large and complex that it becomes difficult to process them using traditional data processing applications.”
Yet. like the 3 V’s definition, this remains largely technical and somewhat vague.
Newer approaches to big data analytics make big data solutions quicker and easier to deploy that traditional BI solutions. They leverage the unstructured nature of Hadoop to allow data to be quickly and easily assimilated and analyzed.
Big data answers the “what if” questions.
Most business decisions are made based on gut feel or intuition.This is not because of a lack of interest in data driven decision making, In face, most large companies have invested hundreds of millions in data warehousing and BI projects with the view to improve analytics and decision making.
Yet, the simple reality is that the monitoring focus of BI does not support forward thinking decision making.
An executive that says “I have this gut feeling… Can you build me a model to test my hypothesis” will wait months for IT to define data models, integrate data, build the analyses and answer the question. The costs incurred in answering this kind of question are not viable. In any case, long before the answer is given the decision, right or wrong, has been made and that business has moved on.
Big data’s strengths lie in answering these kind of questions.
Unsurprisingly, many big data applications lie in the area of customer insight.
Companies that wish to improve their customer experience, maximise pricing, optimise their channel mix or reduce fraud cannot answer the ongoing “what if” questions that arise for each proposed change to existing strategies and approaches.
Big data is about providing rapid time to insight for questions that cannot be cost effectively answered using traditional BI. [Tweet This]
What do you think? Is it time to move towards a more business oriented definition of big data?
Image sourced from http://en.wikipedia.org/wiki/Selective_exposure_theory