Data Governance in a Big Data world

5 Data Governance principles for Big DataBy Theresa Kushner

Let’s face it – we can’t escape Big Data! We can ignore it. We can tell our management that we have it under control. We can even try to make it smaller. But the fact remains – Big Data is here to stay and getting bigger every day. In this environment as data governance managers, we have to find ways to manage unstructured Big Data as well as we manage structured “little” data. So let’s start with the principles we hold dear about how to manage “little” data and hope that we can find some commonality, but also make sure we understand the differences.

Here are just a few important principles managed in a data governance program.[Tweet this]

  1. Accountability
  2. Standardization
  3. Business Alignment
  4. Maintenance
  5. Access control

Most successful governance programs begin with accountability. Who in the organization is accountable for the data that is being governed? Big data is no different than any other data in this area. But finding the person to hold accountable can be tricky. Here are a few suggestions for selecting the right one:

  1. Whose job depends on information gleaned from big data? Marketing? Product development?
  2. Whose collection of big data will be the largest? Does web data in marketing outstrip the number of terabytes dedicated to product or support data collected?
  3. If several groups claim accountability for big data, who has more of a corporate perspective?

Once you’ve answered some of these key questions, make sure that the person you select understands what “accountability” means in your data governance world. Here’s a short definition: Accountability belongs to an individual who can make decisions about the collection, use and management of data.

Standardization in the big data world is almost an oxymoron. One of the attributes of big data is its variety. Standardizing big data is not necessary. What is needed, however, is good cataloging and management tools to ensure that data sources can be located and that once located can be used effectively. Organizing data sources with standardized metadata tags is as close as you might get to standardization. A good metadata management tool is important for any kind of data governance.

Aligning to your business is also important to big data governance. This simply means that the information and key decisions made from the information provided by the analyses of big data is relevant, consistent and appropriate for meeting your business objectives. For example, mining big data for an executive who is “just curious” undermines the governance role as well as exercises resources perhaps unnecessarily.

It’s important to know from the start what part big data plays in your overall data strategy. Will analyses from big data play a contextual role in your understanding of your customers or is it essential for understanding your web-based operations? What you decide to do with the insights you get from big data is key to how your governance program will be structured and managed./p>

Maintenance of big data probably represents the greatest differences from traditional structured data, especially when it comes to metrics you are used to managing such as completeness, consistency, overall quality. These metrics are, for the most part, unnecessary. And the definition of consistency may have to change. With big data consistency may mean how information flows to your analytical environment whereas with structured data it could be about maintaining record to record consistency of data attributes.

In addition, with big data there is usually no requirement to maintain a referential data source to check on how accurate your data is. These sources are often the last stream of information pulled so that consistency and accuracy are measured against the last stream of data you analyzed.

Consistency and accuracy, however, do have a place in ensuring that the metadata you use to tag big data sources maintain an environment for analysts that is consistent, easily accessed and maintained.

In the world of big data, the velocity of the data through your systems makes maintenance very difficult. Your governance now has to contend with which data sets are to be used, when and how often should they flow to your analytical data source. Again, back to your data strategy for support in how this should work for you. Do you need to absorb the data as it comes in, analyze and provide insights immediately or is some of the analysis done by the system itself such as automated systems that analyze information downloaded from thousands of refrigerators in 13 different countries?

Archival and retrieval may not be as great a concern either for two reasons: 1) the analyses done on big data is highly volatile and has a very short shelf life typically and 2) privacy and data management laws are rapidly providing limits to how long big data sets CAN be maintained. You also need to watch how often these data sets are moved across country borders. That’s a big No-No in Europe, for example.

Maria Villa, Global Vice President of SAP in charge of Data Governance, noted in a recent article for Information Management (Data Governance and Big Data) “It is critically important to develop a lifecycle management strategy that includes archiving and deletion policies, business rules and IT automation. The company will not be able to store all this incoming data forever.”

Access control IS important for big data and rules for how to manage that access should be in place BEFORE a big data source is created. Big data analysts are great miners. They can toss out lots of great finds from big data, but also be aware that some of their finds may be loose correlations, not causalities, even though the data may seem to suggest it. So first rule is to ensure that those who have access to big data understand HOW it is different from mining in a structured environment. Because the big data is not often used to drive financial results, there is little worry that access to it can cause damage to your company. It’s the connection of information and the insights that are drawn from the data that cause the issues. Take Target’s faux pas with the pregnant teenager and the wrath and publicity the chain got for finding “facts” that probably should not have been capitalized on in marketing campaign to her parents’ home.

These 5 areas are just a small sample of the difference in data governance between big and small data.

This post was originally published by and is republished with permission. Theresa Kushner has been in data management, in various guises, for nearly 20 years. She has developed data governance capabilities in a number of global organisations, including IBM, Cisco and VMware. Theresa’s Data Governance courses, available through eLearningCurve,com, make her practical experiences available to you.

Learn More from Theresa

Data Governance Fundamentals

Data Governance for Data Stewards

Contact us for South African pricing +27 11 4854856