Beware the Ides of Match


assasination of caesarIf you have heard of the Ides of March, you know you’re supposed to beware them. Why? In ancient Rome, the Ides of March were equivalent to our March 15th. In William Shakespeare’s play, Julius Caesar, a soothsayer warns Caesar to “Beware the Ides of March”, the day on which he would be assassinated by a group of senators including his friend, Brutus.

15 March is just around the corner – what better time to call attention to poorly defined, ineffective or inaccurate match strategies.

Matching (or linking) is a key capability of data quality and master data management. It is the capability to identify duplicate records within a group of, for example, customers, suppliers, products or materials.

Yet, some of the most common match approaches are fraught with problems.

The absolute match. This approach relies on data being identical.

Absolute matches struggle in the real world becuase real data is frequently disimilar. Data may be misspelled, incomplete or inconsistent.
Less obviously, data may be identical but disimilar – for example two empty telephone numbers do not indicate the same contact.

This is why SQL can never really solve a match problem

Overly simplistic match algorithms, such as SOUNDEX

A number of expensive data quality tools rely on overly simplistic match algorithms, such as SOUNDEX

The algorithms cast too broad a net – linking values which are not in fact similar

Let us assume, for example, that we have three contacts – GARY, GERRY and GERALD. Using soundex the first two contacts would match (soundex G600) even though it is in fact the last two that should have matched.

Overly simple match approaches miss matches that should happen, while creating false positive matches – linking records that should not be links

Successful matching requires three things:

  1. Data of sufficient quality. This means that data must be reasonably complete, reasonably standardised, etc.
  2.  A match approach that can handle some level of missing or inaccurate data (fussy) while being repeatable (deterministic)
  3.  The ability to test and isolate specific non-functioning match rules and, in particular, remove false positive matches.

This takes some thought, some effort and some testing.

We help by providing you with proven match rules that avoid these common errors, without requiring your project team to reinvent the wheel.

Failures in matching will derail your entire master data management project.

Beware the Ides of match.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.