Data Governance. The irony of metadata management!

Gary Allemann

Data Governance. The irony of metadata management!

Discover the irony of metadata management in data governance! Learn why metadata management is essential for creating context within a company. Explore the differences between technical and business metadata and the value it brings to the governance environment.

If a principle goal of data governance is to create the context for the use of data within a company, which it is (as discussed in the post Why do you need data governance?) then data governance must include the definitions of terms – so-called metadata management.

Questions to answer about metadata

Is there a difference between technical metadata (data flows, data models, data quality attributes, etc) and business metadata – such as data policies, business glossaries, and the like?

Is all metadata equally valuable?

Does all metadata need to be governed?

When two people are talking about metadata are they likely to be talking about the same thing?

The ambiguity of metadata

The irony for me, is that metadata is by definition ambiguous – data about data – as discussed in the post What is metadata anyway?

The next time you are discussing metadata make sure that the entire room is working with the same definition. Otherwise, you may just be adding to the confusion.[Tweet this]

Once we have defined the technical context we still need a business context.

A key finding of the 2014 Forrester Wave for Data Governance is that data governance has shifted from a technology management endeavour to a business imperative.

You can download the report from the link provided, or read more about it here – Data governance 2.0: Who has what it takes?

Take the data governance poll [Tweet this]

Redefine the way you look at data governance!

Data assets that cannot be easily used by business data stewards may as well not exist. Data governance must create the link between the business goals and policies, to the technical metadata that defines the environment. This requires a new tool – the data stewardship platform – that creates the link between the business and technical metadata layers.

Image sourced from http://upload.wikimedia.org/wikipedia/commons/6/67/6123034166_card_catalog.jpg

Tags:

business context, polling

Responses to “Data Governance. The irony of metadata management!”

Bowie Muyutu

September 18

You raise an important question vis-a-vis what is metadata. Take for example your inclision of data policies as business metadata. Some policies may run into tens of pages. Would the entire document constitue metadata? Me thinks not. In principle metadata should not be verbose. Thus a glossary of term qualiies as metadata but a full treatise of a topic is “data” and not “metadata”. So where do policies lie on this spectrum? I would say a list (title/tpics/headings) of policies that govern, say, data would be captured as metadata but the actual policy statement itself (if it especially runs into tens of pages is the actual data. Another example would be a set of principles used to govern, architecture. Beyond just the name, their is the statement, rationale and implications. Surely none of tht needs to be captured as metadata?

Reply
1. Gary Allemann
  
  September 18
  
  Hi Bowie
  
  thank you for your comment
  
  I would agree that most companies would only capture a subset. however, when we start to get into, for example, some of the more complex legislation impacting on financial institutions, then we may want to capture more detail (or create links to the detail) in order to ensure compliance.
  
  It is the ability to link various (different) metadata elements to each other that adds real value in the governance environment. Also reuse of metadata – so if one terms depends on another we should be able to nest definitions without having to recapture everything.
  
  Policies define how we will use data – the level of detail will depend on the requirement
  
  Reply
John O’Gorman

September 19

@Gary – It’s like the chicken and egg…but I digress. 😀

I’m presenting at the IAIDQ / ECCMA conference next month and one of the themes of my talk is this very question. To prime the pump a little, I am going to offer up a way of looking at the information –> data –> metadata continuum, with thanks to Dr. Barry Devlin for helping me form this idea, this way:

It’s all information. In its purest form, information is ‘out here’. It is language, thoughts and the human experience. Data is technically (Barry uses: ‘digitally’) enhanced information. What that implies straight away, is all data is referential: if we have a word, or name or phrase captured on a rock, a book or a database table it is data. Metadata, then is data used to reference (as a set) itself. (What it also implies is ‘any’ information can be enhanced in any number of ways, to become data.) To some of you, saying that metadata is self-referencing is back to the ‘metadata is data about data’ but here is where the difference lies.

As a discipline, I believe we should be talking about what metadata *does* as opposed to what it is. Metadata is an accelerant. It reduces the amount of energy it takes to turn data back into information (useful energy) again, which brings me to my next ‘aha’. If data is enhanced information and metadata is an accelerant, that would mean that information is the amount of energy data makes available to do useful work in the enterprise. So information is energy, data is mass and metadata is acceleration…hmmm where have we seen that combination before? I – dm2 !

Reply
1. Gary Allemann
  
  September 19
  
  @John Nice analogy and thanks for commenting.
  
  I think that the purpose for which distinct metadata is used can help to categorise it. When we talk about metadata we are, by definition, ambiguous – it is a grouping of various descriptions and terminologies,
  
  When metadata has a purpose then it has context. So a business terms, for example, may only be relevant for a specific business area, not for the whole business. Similarly, the data that represents that business terms may differ in different parts of the business. Are we describing all data, some data, part of some data? In which context. What relationships must it have in order to be valid? Etc. Each of these questions will deliver metadata with different purposes – all valid but different.
  
  Reply
David Eddy

September 19

Gary –

>
> When two people are talking about metadata are they likely to be talking about the same thing?
>

For my response see this clip from Tom Cruise’s “Top Gun”:

About 1m 50s in, Goose responds to Iceman’s 2nd place remark…

I await my first conversation re: “Are we talking about the same kind of metadata?”

Here was my abandoned attempt to contribute to understanding one facet of metadata.
https://en.wikipedia.org/wiki/Talk:Metadata#ZIP_Code:_new_example_please

As you can see I was voted down.

At one point Charlie Betz put in a lot of work on the Wikipedia metadata page. That’s all gone.

In the spirit of learning how other folks look at metadata, I’ve just finished an online course on Coursera about metadata. The instructor was from the school of information sciences (e.g. library school) at University of North Carolina. Very engaging. While it was certainly interesting & useful, there wasn’t a syllable about what I guess I should call “systems metadata.” Google that & see what you get… not much.

One of the course highlights was someone at the Getty Museum talking about how laborious hand curated metadata is. She had an excellent example of how they have 15+ names for Michelangelo.

Reply
John O’Gorman

September 20

Metadata is a function, like ‘pump’ or ‘lever’…the word is not ambiguous at all. If a group of terms, names or phrases is *put to use as metadata* that changes neither the definition of the word nor the terms, names or phrases used in that way. The exact same terms, names and phrases can be put to use as entities or column names. The fact that a word (like table) or a Name like Michelangelo has multiple interpretations is sometimes inconvenient but totally irrelevant to its identity.

There was a paper a while back on the word ‘container’ for example, which tried to demonstrate how ambiguous that word is because, to paraphrase: “anything, from a bucket, to a tin can to your cupped hands qualifies as ‘a container’”. When defined as a function, however, all the ambiguity disappears. Why is metadata not treated the same way?

As a discipline we seem to be hell-bent on constraining metadata to a given set or types of values and we confuse things even further by adding qualifiers like: ‘operational’ and ‘corporate’ and ‘transactional’. Hanging modifiers on term we can’t even define seems unusual for a group dedicated to clarity and quality.

Reply
1. Gary Allemann
  
  September 22
  
  The confusion comes when precisely when we try to use the term. So I have seen a group of data archtects arguing over how to manage metadata, each vehemently disagreeing with the other, only to find that they are arguing over different concepts – each of which may in fact require differeing approaches.
  
  To use your analogy of a container – the terms is completely unambigous when used to describe a generic function.
  
  However, while a pocket is a great vehicle to hold a mobile phone it is a lousy container to hold water.
  
  Any discussion about metadata should start by clearly identifying what that means in the context of the discussion.
  
  Are we talking about Policies, Processes, Models, Definitions (technical or business), attributes, rules, etc?
  
  Reply
deddy5151

September 20

Don’t know what happened to that YouTube address, that’s the wrong video.

Maybe this will work: http://www.youtube.com/watch?v=p890hIa1w9k&index=2&list=PLEC2F8A54CFBD3E21

re: multiple names for the same thing. Identity is irrelevant if the person looking for it can’t find it & is unaware there are multiple spellings of the same or close to same thing.

How am I to know that “kicks” is the British word for “cics?”

Reply
John O’Gorman

September 22

@Gary – My point exactly: the definition of metadata as a *function* is easy; and agreed that *instances* of metadata likewise need to be nailed down (disambiguated) so I’m still struggling with where the problem lies.

@Deddy5151 – Again, you are supporting my argument that disambiguation (via identity and classification of terminology) is job one. If your application (or even a conversation) uses different terms for the same concept you had best get that ambiguity out of the way sooner than later, right?

Reply
1. Gary Allemann
  
  September 23
  
  @John – Agreed.
  
  Reply
2. deddy5151
  
  September 23
  
  John –
  
  >
  > disambiguation (via identity and classification of terminology) is job one.
  >
  
  Here’s a little case… clues sprinkled around… http://www.tdan.com/view-articles/6123
  
  While certainly an essential requirement, disambiguation is “just” one of many tasks.
  
  The real question is have you seen an organization (with its many work groups, divisions, departments, product lines, applications, silos, etc.) actually nail the disambiguation challenge?
  
  For me the hook appeared (I didn’t know it was sunk) in 1980 at a life insurance company long known for it’s devotion to data management efforts. They’d installed & well populated a “data dictionary” (what today is labeled “metadata repository”) & found 70 different names / labels for the core business concept: “policy number.”
  
  I think this single most prominent / common approach I’ve seen is when a data bigot picks up the reins & declares there will be a single naming standard across the enterprise. People role their eyes & ignore such .
  
  The case (I wrote about it, I did NOT do the work) had several aspects that are extremely difficult to reproduce today… size (the organization was only about 50 in IT & the mainframe was the only choice) & the “data controller” (this was loooooooong before formal titles like “data steward) was a Marine.
  
  How does one make disambiguation happen today?
  
  Reply
John O’Gorman

September 23

@David – Good points all around, and the article certainly demonstrates how the correct balance between the corporate need for compliance, control and consistency on one hand is offset by human nature and the need for autonomy on the other – especially in the application development field.

I’ve posted this sentiment elsewhere, but I think it’s worth repeating: information, in its most elemental forms (terms, names, acronyms, abbreviations, etc.) is free to be used in whichever way a designer (or developer) chooses. This means the string: “David Eddy” can be ‘put to work’ as an entity, an attribute, a folder name, a metadata value, etc. If a business unit uniquely identifies that string (so that it represents the same reference anywhere it appears) as having value and classifies it as a representation of a Person, then we are on our way to a more robust information management environment. Representations of the same real world entity (e.g. deddy5151, “Eddy, Dave” etc) are treated as Equivalents. The same technique can be applied to translations, acronyms and alternate forms (dates, for example.)

The flipside technique for establishing disambiguation routines is a simple as recognizing that while there are several dialects spoken in business (HR, Engineering, IT and Legal to name a few) there are a finite number of classes that information elements can occupy.

Where is it happening? Everywhere I go!

Reply

Data Governance. The irony of metadata management!

Responses to “Data Governance. The irony of metadata management!”

Leave a comment Cancel reply

Related posts

Data Governance. The irony of metadata management!

Questions to answer about metadata

The ambiguity of metadata

Take the data governance poll [Tweet this]

Redefine the way you look at data governance!

Share this:

Responses to “Data Governance. The irony of metadata management!”

Leave a comment Cancel reply

Related posts

Discover more from Data Quality Matters