A great post from Infotrellis on matching techniques.
One area this does not cover is the ease of tuning of match rules.
Due to their complexity pure probabilistic approaches can prove very difficult to understand and therefore tune.
When we are matching, and expect the results to be used to update production data, we have had to write specific test cases that can enable business to isolate specific match criteria and be confident that the results deliver as expected.
Very simply, probabilistic matching struggles to support this as results are not granular and the process of getting to them is easily understood. If humans have to validate every match this defeats the object.
Ultimately Trust is at stake – there is no silver bullet to matching.
Matching requires planning, it requires experience and it requires the right approach.
It is not surprising that most MDM tools offer connectivity to specialist data quality and matching tools (even if they don;t always advertise this).
If the approach taken by your MDM tool requires masses of human intervention you may want to consider alternatives.
Ask them which data quality tools can plug in – you may be surprised at the options this gives you to save time and increase confidence in your match result
Deterministic Matching versus Probabilistic Matching
Which is better, Deterministic Matching or Probabilistic Matching?
I am not promising to give you an answer. But through this article, I would like to share some of my hands-on experiences that may give some insights to help you make an informed decision in regards to your MDM implementation.
Before I got into the MDM space three years ago, I worked on systems development encompassing various industries that deal with Customer data. It was a known fact that duplicate Customers existed in those systems. But it was a problem that was too complicated to address and was not in the priority list as it wasn’t exactly revenue-generating. Therefore, the reality of the situation was simply accepted and systems were built to handle and work around the issue of duplicate Customers.
Corporations, particularly the large ones, are now recognizing the importance of having a better knowledge…
View original post 2,965 more words