There are a lot of similarities between reference and master data – both can be defined as an abstraction of a real world person or thing. In some cases, the terms are used interchangeably.
Reference data is most commonly used to describe reusable list of codes and descriptions stored in a database in the form of a “lookup table”, for example, the standard ISO 3166-1 list of country codes and descriptions. In many cases this data represents information that is external to the organisation.
Some people are starting to use the term as a synonym for master data – for example, your companies definitive list of client data could be regarded as reference data for other applications looking for the latest, most correct information for a specific client. This data is largely internal – it is unlikely that it would be reusable outside of your organisation.
The problem with standards!
- International standards are not always standards! For example, ISO 3166-1 allows a 2 digit code, a 3 digit code and a numeric code – each of which include user defined codes. This means that 2 systems, each of which use the “same” international standard, may have different codes for the same country.
- There are multiple standards – although the ISO standard is becoming generally accepted for countries there are various other standards – such as the NATO list – in common use. These different standards may share many values, making it difficult to determine which code is in use.
- There are multiple versions of the truth. Standards are revised and updated and, depending on the date when your systems were implemented, you may have the same code meaning different things.
It is also critical to understand whether your external partners are using the same standards as you are. Do you have a common understanding of shared data – or do you just think you do?
The most critical lesson is not to assume that a standard is consistent across systems. A data profiling exercise will very quickly highlight inconsistencies and allow you to manage these.
Data Quality is critical!
Reference data is typically presented to the user in the form of a description, but captured as a code. Imagine, a database where the code “M” is assigned to Female, and the code “F” is assigned to “Male”. In this case, the user captures the gender as “Female”, but for reporting purposes the business looks at the code (“F”) and assumes this means Female. You will have a complete reversal of your gender statistics.
This is a simple example – many reference tables contain long lists. For example, there are approximately 200 countries in the world. The Standard Industrial Classification (SIC) standards comprise hundreds of codes used to identify the principle business of a company. These code are used by banks to measure their exposure to a particular segment.
Errors in look up tables can have severe consequences.
Imagine that your life insurance business pushed up the premiums on all women (thinking they were men) and reduced the premiums to all men (they appear as women) – on the basis of a faulty look up code!
The women would probably look for a better deal somewhere else. The men would expose you to a much higher risk than was statistically viable. Your earnings and share price would plummet!
Regular data audits of your reference data are important to ensure that simple typing errors do not expose your business to this kind of risk. These audits should be built into your governance frameworks and should also be mandatory as test cases for any system upgrades or enhancements.
A data profiling and discovery tool set makes it easy for business data stewards to perform these checks and should be seriously considered as part of your governance tool set
Pingback: How well do you know your foreign customer? « Data Quality Matters
Pingback: 5 quick facts to get value from your reference data | Data Quality Matters