Reference Data management – when men are dressed as women.

Implement proactive measures to address data quality issues in your reference data by conducting regular Data Quality Audits and Assessments and ensuring continuous improvement in data management practices.

There are a lot of similarities between reference data and master data – both can be defined as an abstraction of a real-world person or thing. In some cases, the terms are used interchangeably.

What is Reference Data?

Reference data is most commonly used to describe reusable lists of codes and descriptions stored in a database in the form of “lookup tables”, for example, the standard ISO 3166-1 list of country codes and descriptions. In many cases, this data represents information that is external to the organisation.

Some people are starting to use the term as a synonym for master data – for example, your company’s definitive list of client data could be regarded as reference data for other applications looking for the latest, most correct information for a specific client. This data is largely internal – it is unlikely that it would be reusable outside of your organisation.

The characteristics of reference data are:

It provides standardized codes and classifications.
It is used for data validation, data integration, and reporting purposes.
It is generally maintained by external sources or regulatory bodies.
It can be specific to an industry or domain.

What is Master Data?

Master data refers to the core and essential data that is critical for the operations and decision-making processes of an organization.

It represents the key entities and attributes that are used to manage and support business activities. Master data typically includes information about customers, products, suppliers, employees, locations, and other foundational elements of an organization’s business environment.

Master data serves as a central point of reference for various systems and processes within an organization. It provides a consistent and standardized view of important data across different departments and business functions. By maintaining accurate and up-to-date master data, organizations can ensure data integrity, improve operational efficiency, and make informed decisions based on reliable information.

The characteristics of master data are:

It is typically created and managed by the organization.
It is used as a primary source of information across different systems and processes.
It requires data governance practices to ensure data integrity and quality.
It changes infrequently and is considered relatively stable over time.

What is the difference between master data and reference data?

In summary, while master data represents the core entities and attributes essential to an organization, reference data provides the codes and classifications used to categorize and classify data within an organization’s systems. Master data is managed internally by the organization, while reference data is often maintained by external sources or regulatory bodies.

The problem with standards!

International standards are not always standards!

For example, ISO 3166-1 allows a 2-digit code, a 3-digit code and a numeric code – each of which includes user-defined codes. This means that 2 systems, each of which uses the “same” international standard, may have different codes for the same country.

There are multiple standards

Although the ISO standard is becoming generally accepted for countries there are various other standards – such as the NATO list – in common use. These different standards may share many values, making it difficult to determine which code is in use.

There are multiple versions of the truth.

Standards are revised and updated and, depending on the date when your systems were implemented, you may have the same code meaning different things.

From a data governance perspective, it is important to understand, particularly when aggregating data, which standards are in use and how you will translate from one to another.

It is also critical to understand whether your external partners are using the same standards as you are. Do you have a common understanding of shared data – or do you just think you do?

The most critical lesson is not to assume that a standard is consistent across systems. A data profiling exercise will very quickly highlight inconsistencies and allow you to manage these.

Data Quality is critical!

Reference data is typically presented to the user in the form of a description, but captured as a code. Imagine, a database where the code “M” is assigned to Female, and the code “F” is assigned to “Male”. In this case, the user captures the gender as “Female”, but for reporting purposes, the business looks at the code (“F”) and assumes this means Female. You will have a complete reversal of your gender statistics.

This is a simple example – many reference tables contain long lists.

For example, there are approximately 200 countries in the world. The Standard Industrial Classification (SIC) standards comprise hundreds of codes used to identify the principal business of a company. These codes are used by banks to measure their exposure to a particular segment.

Errors in lookup tables can have severe consequences.

Imagine that your life insurance business pushed up the premiums on all women (thinking they were men) and reduced the premiums to all men (they appear as women) – on the basis of a faulty lookup code!

The women would probably look for a better deal somewhere else. The men would expose you to a much higher risk than was statistically viable. Your earnings and share price would plummet!

Regular data quality audits of your reference data are important to ensure that simple typing errors do not expose your business to this kind of risk. These audits should be built into your governance frameworks and should also be mandatory as test cases for any system upgrades or enhancements.

Learn how to create a comprehensive data quality scorecard that encapsulates key data quality dimensions, providing a holistic view of your data quality status.

A data profiling and discovery tool makes it easy for business data stewards to perform these checks and should be seriously considered as part of your data governance toolset

Learn effective techniques for crafting meaningful data quality metrics that provide actionable insights into the health of your data.

Responses to “Reference Data management – when men are dressed as women.”

How well do you know your foreign customer? « Data Quality Matters

March 19

[…] Can we agree in a single standard for reference data? This is not as easy as it may seem as discussed in my post Bad data – when men are dressed as women […]

5 quick facts to get value from your reference data | Data Quality Matters

July 8

[…] and errors in reference data can create significant reporting errors – for example, in one database the Code “SA” may represent “Saudi Arabia” and in another […]