Metadata management – When is H2O not water?

Discover the importance of metadata management and its impact on data quality. Learn how ambiguity in data can have life-or-death implications, and explore the best practices for handling ambiguous records.


when is H2O not water

One of my children posted the attached joke on Facebook recently.

Life-or-Death Consequences

It raises an interesting point.

Ambiguity in data can have life-or-death implications.

In most companies, data volumes prohibit manual exception management of ambiguous data – particularly for matching.

Where data volumes are lower, staff complements tend to be low too. So 70 or 80 exceptions a month may be too many for a small business to deal with.

Automated matching

Automated matching technologies are an extremely useful tool for identifying duplicate records in your client base, inventory,  supply chain, or elsewhere.

The question that arises is how to ensure that ambiguity is dealt with correctly!

Dealing with ambiguity

There are two extremes.

At one end you do not match any ambiguous records. “J. Smith” may or may not be “John Smith” – so you continue to treat these as two separate clients.

On the other extreme you would match all ambiguous records – of course “J. Smith” must be “John Smith”!

False positive matches – where you incorrectly assume that two separate entities are the same – expose your business to far more risk than the other possibility – that you incorrectly continue to treat the same entity as more than one individual.

Which is worse: that as a client I have two account numbers and receive two invoices, or that as a client I cease to exist in your dataset and am never invoiced?

Any matching process that does not first standardise data to remove ambiguity is prone to false positive matches that may never be picked up.

The key requirement for matching is to break data into its elements – e.g. a product description may be made up of the BRAND, the UNIT OF MEASURE, the MATERIAL and the COLOUR.

Each of these elements should be standardised to the extent necessary to remove ambiguity – is “H2O too” really the same as “H2O2”? (NO!).

Where ambiguity remains it is better to err on the side of caution – merging two client or product records incorrectly can have catastrophic consequences and be almost impossible to undo.

With experience, you can quickly identify which are the key elements of an object and focus your effort on standardising these only.

This gives you data certainty that the matches that have been successful can be ignored as exceptions – and allow you to focus your attention on dealing with the much smaller subset of records that have been failed as possible matches (if this is important).

This will reduce your operational data management costs and will substantially increase your return on investments in MDM, CRM or similar “single view” technologies.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.



Related posts

Discover more from Data Quality Matters

Subscribe now to keep reading and get our new posts in your email.

Continue reading