I recently had a client complaining that he was struggling with a reporting requirement because his technology platform could not cope with “Big Data”. Upon investigation I found he was dealing with approximately 20million rows from a relatively small (ten columns) table – not what I would have considered to be Big Data at all.
This raised the question – at what point does data become big data? In a recent post, Robin Bloor asks this very question.
His observation – current trends in the growth of data volumes are not new! Companies, in sectors such as telecommunications and retail, have historically managed significant volumes of data. His conclusion – whether a company has a genuine “big data” requirement or not they will still have to manage ever increasing volumes of data.
Robin continues to discuss the impact of new technologies, such as Hadoop, and the complexities associated with managing not just large volumes, but a distributed architecture that includes Hadoop components built for speed, or to process unstructured data, as well as traditional database components.
His conclusion, you have Big Data if you are having problems managing it, irrespective of volume.
By this definition, many companies have big data issues today. Data management remains an afterthought for many companies, or is implemented at a tactical level that does not address enterprise complexities.
The complexities of Big Data require a more rigorous management approach if business benefits are to be realised. Trilium Software’s, Nigel Turner highlights four foundation stories to exploit Big Data in an article published in Database Marketing.
1.) The ability to identify the right data to solve the problem
2.) The ability to integrate and match varied data from multiple data sources
3.) The necessary IT infrastructure to support Big Data initiatives
4.) Having the right capabilities and skills to exploit
Of course, these are traditional pillars of Data Quality and Data Governance – a big data strategy should ultimately form part of the overall data management strategy. Simply throwing technology at the problem will create unnecessary cost and complexity and, frankly, we have enough of that already.