12 thoughts on “Data Governance. The irony of metadata management!

  1. You raise an important question vis-a-vis what is metadata. Take for example your inclision of data policies as business metadata. Some policies may run into tens of pages. Would the entire document constitue metadata? Me thinks not. In principle metadata should not be verbose. Thus a glossary of term qualiies as metadata but a full treatise of a topic is “data” and not “metadata”. So where do policies lie on this spectrum? I would say a list (title/tpics/headings) of policies that govern, say, data would be captured as metadata but the actual policy statement itself (if it especially runs into tens of pages is the actual data. Another example would be a set of principles used to govern, architecture. Beyond just the name, their is the statement, rationale and implications. Surely none of tht needs to be captured as metadata?

    1. Hi Bowie

      thank you for your comment

      I would agree that most companies would only capture a subset. however, when we start to get into, for example, some of the more complex legislation impacting on financial institutions, then we may want to capture more detail (or create links to the detail) in order to ensure compliance.

      It is the ability to link various (different) metadata elements to each other that adds real value in the governance environment. Also reuse of metadata – so if one terms depends on another we should be able to nest definitions without having to recapture everything.

      Policies define how we will use data – the level of detail will depend on the requirement

  2. @Gary – It’s like the chicken and egg…but I digress. 😀

    I’m presenting at the IAIDQ / ECCMA conference next month and one of the themes of my talk is this very question. To prime the pump a little, I am going to offer up a way of looking at the information –> data –> metadata continuum, with thanks to Dr. Barry Devlin for helping me form this idea, this way:

    It’s all information. In its purest form, information is ‘out here’. It is language, thoughts and the human experience. Data is technically (Barry uses: ‘digitally’) enhanced information. What that implies straight away, is all data is referential: if we have a word, or name or phrase captured on a rock, a book or a database table it is data. Metadata, then is data used to reference (as a set) itself. (What it also implies is ‘any’ information can be enhanced in any number of ways, to become data.) To some of you, saying that metadata is self-referencing is back to the ‘metadata is data about data’ but here is where the difference lies.

    As a discipline, I believe we should be talking about what metadata *does* as opposed to what it is. Metadata is an accelerant. It reduces the amount of energy it takes to turn data back into information (useful energy) again, which brings me to my next ‘aha’. If data is enhanced information and metadata is an accelerant, that would mean that information is the amount of energy data makes available to do useful work in the enterprise. So information is energy, data is mass and metadata is acceleration…hmmm where have we seen that combination before? I – dm2 !

    1. @John Nice analogy and thanks for commenting.

      I think that the purpose for which distinct metadata is used can help to categorise it. When we talk about metadata we are, by definition, ambiguous – it is a grouping of various descriptions and terminologies,

      When metadata has a purpose then it has context. So a business terms, for example, may only be relevant for a specific business area, not for the whole business. Similarly, the data that represents that business terms may differ in different parts of the business. Are we describing all data, some data, part of some data? In which context. What relationships must it have in order to be valid? Etc. Each of these questions will deliver metadata with different purposes – all valid but different.

  3. Gary –

    > When two people are talking about metadata are they likely to be talking about the same thing?

    For my response see this clip from Tom Cruise’s “Top Gun”:

    About 1m 50s in, Goose responds to Iceman’s 2nd place remark…

    I await my first conversation re: “Are we talking about the same kind of metadata?”

    Here was my abandoned attempt to contribute to understanding one facet of metadata.

    As you can see I was voted down.

    At one point Charlie Betz put in a lot of work on the Wikipedia metadata page. That’s all gone.

    In the spirit of learning how other folks look at metadata, I’ve just finished an online course on Coursera about metadata. The instructor was from the school of information sciences (e.g. library school) at University of North Carolina. Very engaging. While it was certainly interesting & useful, there wasn’t a syllable about what I guess I should call “systems metadata.” Google that & see what you get… not much.

    One of the course highlights was someone at the Getty Museum talking about how laborious hand curated metadata is. She had an excellent example of how they have 15+ names for Michelangelo.

  4. Metadata is a function, like ‘pump’ or ‘lever’…the word is not ambiguous at all. If a group of terms, names or phrases is *put to use as metadata* that changes neither the definition of the word nor the terms, names or phrases used in that way. The exact same terms, names and phrases can be put to use as entities or column names. The fact that a word (like table) or a Name like Michelangelo has multiple interpretations is sometimes inconvenient but totally irrelevant to its identity.

    There was a paper a while back on the word ‘container’ for example, which tried to demonstrate how ambiguous that word is because, to paraphrase: “anything, from a bucket, to a tin can to your cupped hands qualifies as ‘a container'”. When defined as a function, however, all the ambiguity disappears. Why is metadata not treated the same way?

    As a discipline we seem to be hell-bent on constraining metadata to a given set or types of values and we confuse things even further by adding qualifiers like: ‘operational’ and ‘corporate’ and ‘transactional’. Hanging modifiers on term we can’t even define seems unusual for a group dedicated to clarity and quality.

    1. The confusion comes when precisely when we try to use the term. So I have seen a group of data archtects arguing over how to manage metadata, each vehemently disagreeing with the other, only to find that they are arguing over different concepts – each of which may in fact require differeing approaches.

      To use your analogy of a container – the terms is completely unambigous when used to describe a generic function.

      However, while a pocket is a great vehicle to hold a mobile phone it is a lousy container to hold water.

      Any discussion about metadata should start by clearly identifying what that means in the context of the discussion.

      Are we talking about Policies, Processes, Models, Definitions (technical or business), attributes, rules, etc?

  5. @Gary – My point exactly: the definition of metadata as a *function* is easy; and agreed that *instances* of metadata likewise need to be nailed down (disambiguated) so I’m still struggling with where the problem lies.

    @Deddy5151 – Again, you are supporting my argument that disambiguation (via identity and classification of terminology) is job one. If your application (or even a conversation) uses different terms for the same concept you had best get that ambiguity out of the way sooner than later, right?

    1. John –

      > disambiguation (via identity and classification of terminology) is job one.

      Here’s a little case… clues sprinkled around… http://www.tdan.com/view-articles/6123

      While certainly an essential requirement, disambiguation is “just” one of many tasks.

      The real question is have you seen an organization (with its many work groups, divisions, departments, product lines, applications, silos, etc.) actually nail the disambiguation challenge?

      For me the hook appeared (I didn’t know it was sunk) in 1980 at a life insurance company long known for it’s devotion to data management efforts. They’d installed & well populated a “data dictionary” (what today is labeled “metadata repository”) & found 70 different names / labels for the core business concept: “policy number.”

      I think this single most prominent / common approach I’ve seen is when a data bigot picks up the reins & declares there will be a single naming standard across the enterprise. People role their eyes & ignore such .

      The case (I wrote about it, I did NOT do the work) had several aspects that are extremely difficult to reproduce today… size (the organization was only about 50 in IT & the mainframe was the only choice) & the “data controller” (this was loooooooong before formal titles like “data steward) was a Marine.

      How does one make disambiguation happen today?

  6. @David – Good points all around, and the article certainly demonstrates how the correct balance between the corporate need for compliance, control and consistency on one hand is offset by human nature and the need for autonomy on the other – especially in the application development field.

    I’ve posted this sentiment elsewhere, but I think it’s worth repeating: information, in its most elemental forms (terms, names, acronyms, abbreviations, etc.) is free to be used in whichever way a designer (or developer) chooses. This means the string: “David Eddy” can be ‘put to work’ as an entity, an attribute, a folder name, a metadata value, etc. If a business unit uniquely identifies that string (so that it represents the same reference anywhere it appears) as having value and classifies it as a representation of a Person, then we are on our way to a more robust information management environment. Representations of the same real world entity (e.g. deddy5151, “Eddy, Dave” etc) are treated as Equivalents. The same technique can be applied to translations, acronyms and alternate forms (dates, for example.)

    The flipside technique for establishing disambiguation routines is a simple as recognizing that while there are several dialects spoken in business (HR, Engineering, IT and Legal to name a few) there are a finite number of classes that information elements can occupy.

    Where is it happening? Everywhere I go!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.