In metadata management we often talk about dictionaries and glossaries.
What is the difference and what is the relationship between these?
If we look at dictionary definitions we see that a dictionary is a work in which words and terms are defined listed in alphabetical order.
A glossary is a mini-dictionary – in effect an alphabetical listing of terms related to a specific subject e.g. a sporting glossary or a business glossary
What is a business glossary?
For data to be meaningful, its context needs to be clear.
Sounds easy, but rarely do people across the enterprise share a common understanding of even basic business terms like “customer” or “product.” And even if they do today, those meanings change over time.
A Business Glossary defines terms across a business domain, providing an authoritative source for all business operations, including its Database Systems.
It helps stakeholders across your organization collaboratively agree on the definitions, rules, and policies that define your data and manage reviews and approvals. And it makes this information easily accessible to every data citizen for faster adoption.
What is a data dictionary?
A data dictionary is a collection of descriptions of the data objects or items in a data model for the benefit of programmers and others who need to refer to them.
It describes the structure of a piece of data, its relationship to other data, and its origin, format, and use.
The Data Dictionary serves as a searchable repository for people who need to understand how and where data is stored and how it can be consumed. With it, you can also document roles and responsibilities and launch appropriate workflows to define, describe, and map data.
What is the relationship?
Business terms provide context and meaning to the data dictionary.
A technical field represents or instantiates a business terms.
For example, we may have a business term “Date of Birth” which is defined as “The date on which the party was born as per their birth certificate”.
Date of birth may be represented in a CRM system in the schema “Contact” with the field name “dob” and stored as a date field.
In the Finance System, date of birth may be stored in the schema “CustomFields” stored in “CustomAttr1” with the format text
Business Term | Definition | CRM | Finance |
Date of Birth | The date on which the party was born as per their birth certificate | contact.dob | CustomFields.CustomAttr1 |
In practice, not every business term will have a technical representation. Birth Certificate (from the example above) may be defined for clarity, but may not be represented in my data model. However, most business terms may have many technical representations across different systems and reports.
How do I use this?
Let’s assume, for example, that I need to an Age analysis of my customers.
I can define Customer Age as being the current date minus the customer’s date of birth. As a business user, this allows me to define my requirement in plain language, without being tied to the underlying data complexities.
The relationship above allows me to find the different representations accessible in different systems – even when the fields names are not clear (such as CustomAttr1 in the example). In many off the shelf applications underlying field names may not be meaningful, or, in some cases, fields may no longer be used for their original purpose.
Accessing the dictionary would allow the technical programmers to identify that CustomAttr1 must be converted to a date before the calculation can be completed.
How does Stewardship work?
Separating my business glossary from my data dictionaries mean that I can assign different stewards for business and technical terms.
Governance and stewardship should seek to ensure that critical information is captured, that technical and business responsibilities are clear, and that relevant relationships between business and technical assets are captured.
The post From business glossary to business authority has more on this topic
This is a good read, thank you Gary.
I am wondering though how you feel about including synonyms in the glossary.
Also, would you for example include First Name, Beneficiary and Beneficiary First Name in your glossary?
Hi Jaunine. Yes, I would certainly include synonyms that are in common use in my business. I would suggest creating the primary term e.g First Name that would be most preferrred and do my definitions at this level only. Synonyms woulb be linked to this term
When the focus on a particular project is on Java /JS/ Web application with an internal SQL database, we can generate the DB model from Eclipse, but it does not contain the range values of the column types. In such a context, it should be OK to list Table Names, the UML Table class definitions, and Table creation DB Schema as DDL statements. This can be called the Java web style of data dictionary, as this DB API is internal, not to be called by users who call the REST API of the Web Application. And all the range and value checks will be caught in realtime at the GUI front end part of the Web application.
Any comments?
HI Sushil – thanks for your question. I would suggest that you would want to capture the range / allowed values for each column as rules linked to your glossary. Fundamentally the metadata is documentation that allows any stake holder to quickly and easily understand the data they are looking for. You may think of this as your Business Requirements Definition that your programmer would use to define the real time checks.