A friend recently updated her Facebook status to “engaged”. Shortly afterwards she noticed that the ads that were popping up around her facebook page were wedding oriented.
Another friend, after adding some photographs to her profiles was freaked out by the question – “Did you take these at Bob’s wedding in Nieu Bethesda?”. She had – how did they know?
These kinds of analyses are examples of the emerging trend towards big data – in particular embracing the data generated by social media – in order to drive competitive advantage. According to McKinsey Research the use of this kind of data in a retail space can increase the operating margin by as much as 60%.
Yet bringing this kind of data into an enterprise creates serious challenges. According to the IDC there is already more data than there is space to store it. Approximately 90% of the data that is being created is unstructured – data that does not arise from transactional activities or fit into a relational database. Some examples would include email, video, spatial data, documents and spreadsheets.
According to IBM, the volume of ALL data that was created before 2003 is now being recreated every two days. The challenge for most organisations will be to separate the wheat from the chaff – of the myriad objects created daily how will you filter out those that have value (or must be retained and protected for legal purposes) from those that are garbage.
Your data governance program should be planning for this challenge already.
Do you have a plan for managing unstructured data?
Can you you answer questions such as: “Where is our critical data stored? Who created it? Who has access to it? Should they have access to it? Which of this data is redundant or stale and which of it must be kept?”
Your ability to answer this kind of question will be impacted by both your governance policies (what are yourpolicies for big data?) as well as your technical ability to handle large volumes of data using data governance tools .
Of course, knowing where it is is one thing. The other challenge that must be overcome is addressing the quality of these huge volumes. A simple example, my friend had been engaged for months prior to updating her status on facebook – by the time the wedding planners got her in their sights she was practically on her honeymoon. The volumes of data combined with the untrusted nature of the source mean that much of the data received is of no value.
Your business need to decide how relevant each potential source of data is to your competitive advantage. Yes, big data can add real value, but the business that is not buried under a mountain of garbage may be able to make better decisions than one that is. Data quality will become even more critical as a differentiator as big data becomes pervasive.
This post was originally published in the Data Quality Matters blog