
To ensure the accuracy and reliability of your data, it’s essential to understand and mitigate common data quality issues.
If you have heard of the Ides of March, you know you’re supposed to beware them.
Why?
In ancient Rome, the Ides of March were equivalent to our March 15th. In William Shakespeare’s play, Julius Caesar, a soothsayer warns Caesar to “Beware the Ides of March”, the day on which he would be assassinated by a group of senators including his friend, Brutus.
15 March is just around the corner – what better time to call attention to poorly defined, ineffective or inaccurate match strategies.
Matching (or linking) is a key capability of data quality and master data management. It is the capability to identify duplicate records within a group of, for example, customers, suppliers, products or materials.
Common Match Approaches and their Issues
Yet, some of the most common match approaches are fraught with problems.
Absolute Matching
The absolute (or exact) match. This approach relies on data being identical.
Absolute matches struggle in the real world because real data is frequently dissimilar. Data may be misspelt, incomplete or inconsistent.
Less obviously, data may be identical but dissimilar – for example, two empty telephone numbers do not indicate the same contact.
This is why SQL can never really solve a matching problem.
Simplistic Match Algorithms
Overly simplistic match algorithms, such as Soundex
A number of expensive data quality tools rely on overly simplistic match algorithms, such as Soundex
The algorithms cast too broad a net – linking values which are not in fact similar
Let us assume, for example, that we have three contacts – GARY, GERRY and GERALD. Using Soundex the first two contacts would match (soundex G600) even though it is, in fact, the last two that should have matched.
Overly simple match approaches miss matches that should happen, while creating false positive matches – linking records that should not be links
Three requirements for successful matching
Successful matching requires three things:
- Data of sufficient quality. This means that data must be reasonably complete, reasonably standardised, etc.
- A match approach that can handle some level of missing or inaccurate data (fuzzy) while being repeatable (deterministic)
- The ability to test and isolate specific non-functioning match rules and, in particular, remove false positive matches.
This takes some thought, some effort and some testing.
We help by providing you with proven match rules that avoid these common errors, without requiring your project team to reinvent the wheel.
Failures in matching will derail your entire master data management project.
Beware the Ides of Match.
Discover the top 10 data quality challenges in Africa and how they impact businesses in the region. Learn more about these challenges
Assess whether Excel is a good tool for analytics and understand its limitations. Explore the impacts of data quality errors on Excel-based analytics, particularly in scenarios like COVID-19 tracking

Leave a comment