What is MDM – Data Quality (Standardisation and Enrichment)

Grasp the distinctions between MDM, data integration, and data quality. Dive into our insightful article ‘Understanding MDM, Data Integration, or Data Quality‘ to learn how each aspect contributes to achieving optimal data management.

In my last post, I focused on the importance of a matching approach that can be easily understood by a human. Most important, is the ability to isolate (and improve or remove from the match) specific failing instances. The ability to do this ensures trust in the automated results – which, in turn, means that business does not need to wade through and validate every match – an impossible task in any real environment.

The overriding assumption for matching is that data is sufficiently similar for a computer to consistently make the correct decision. Data cleansing is a critical prerequisite for this for two principal reasons.

We may need to add missing data for key fields (enrichment)
We may need to ensure that data is captured more consistently (Standardisation).

Enrichment
- What is enrichment?
Standardisation
- What is standardisation?
Conclusion

Enrichment

I have delivered a number of projects for which the client name was the only common or shared attribute. Unfortunately, Name is a very poor indicator of uniqueness, whether for a business or an individual. By adding additional information; such as Date of Birth, Company Registration Number, address information, Income tax numbers etc; we can improve our confidence that Mr Smith and Joe Smith are the same individual.

What is data enrichment?

Data enrichment refers to the process of enhancing and augmenting existing data with additional information from various sources. It involves adding relevant details, attributes, or context to existing data sets to make them more valuable and comprehensive.

The purpose of data enrichment is to improve the quality, accuracy, and completeness of data, thereby enabling organizations to gain deeper insights and make better-informed decisions. By enriching data, companies can fill in any gaps, correct errors, and enhance the overall understanding of their customers, prospects, or any other data entities.

Data enrichment typically involves the following activities:

Appending Data: This involves adding new information to existing data sets. For example, appending demographic data, such as age, gender, or income, to customer records can provide valuable insights for targeted marketing campaigns.
Geocoding: Geocoding involves adding geographic information, such as latitude and longitude coordinates, to data. This enables businesses to analyze and visualize data spatially, making it useful for location-based analyses and services.
Social Media Data Integration: Incorporating social media data into existing data sets allows organizations to gain insights into customer sentiment, preferences, and behaviour patterns. This can be valuable for reputation management, market research, and personalized marketing.
Data Standardization: Standardizing data involves normalizing, validating, and cleaning data to ensure consistency and accuracy. This process includes formatting addresses, standardizing names, and removing duplicates or incorrect entries.
Third-Party Data Integration: Augmenting existing data with external sources, such as public records, industry databases, or purchased datasets, can provide additional context and insights. This can include information like financial data, industry classifications, or historical records.

Data enrichment enables organizations to unlock the full potential of their data by making it more complete, relevant, and actionable. It helps improve decision-making processes, enhance customer experiences, and drive business growth by enabling better targeting, personalization, and strategic planning.

In many cases, the information is available within the corporate environment but may not be shared across all applications.

So, for example, telephone numbers may be held in the client master and call centre client applications but not in the billing system. Similarly, date of birth may be held in the client master and billing applications, but not the call centre. If we match, between the Client master and billing application, using Name and Date of Birth, we can add a telephone number to the Billing System which can, in turn, be used to match to the call centre.

This is a simplistic example, but in practice, the more information you can derive and add to each system, the more flexibility you will have in your matching to other less populated systems. This is necessary to engender business confidence/trust in the final result.

Standardisation

The other critical factor to consider is the standardisation of data.

Computers are not good at magically resolving serious ambiguity.

Simple standardisation might involve recoding the “1” and “2” used in one system to the “Male” and “Female” used by another. Is Joe Smith, born on 12/11/09 the same person as Joe Smith born on 09-12-11. It depends – are both dates stored as day/month/year. By resolving these simple ambiguities in advance we can radically improve accurate matching.

What is standardisation?

Standardization refers to the process of establishing and implementing a set of rules, guidelines, or specifications to ensure consistency, uniformity, and compatibility in various aspects of operations, products, or data.

In the context of data, standardization involves transforming and organizing data in a consistent format or structure, typically following predefined rules or standards. The goal is to ensure that data is uniform, comparable, and compatible across different systems, applications, or sources.

Data standardization can include the following activities:

Formatting: Data formatting involves defining a consistent structure for data elements, such as dates, addresses, or numerical values. This ensures that data is presented in a uniform manner, making it easier to analyze and compare.
Cleansing: Data cleansing involves identifying and correcting errors, inconsistencies, or inaccuracies in data. This can include removing duplicates, fixing typos, or resolving missing or incomplete information. Cleansing data helps improve its quality and reliability.
Normalization: Data normalization involves organizing data in a standardized format, typically eliminating redundancies and inconsistencies. This process ensures that data is stored efficiently and can be easily compared or analyzed.
Vocabulary Standardization: Standardizing the vocabulary or terminology used within data sets ensures that the same terms or labels are consistently used across different sources or systems. This facilitates data integration and improves the accuracy of analysis and reporting.
Code Standardization: Code standardization involves establishing a consistent coding system for data elements, such as product categories, industry classifications, or geographic codes. This ensures that data is classified uniformly, enabling easier data aggregation and analysis.

Standardizing data is crucial for data integration, interoperability, and data sharing between different systems or organizations. It enables efficient data processing, improves data quality, and ensures that data can be accurately interpreted and utilized for various purposes, such as reporting, analysis, or decision-making.

More complex standardisation requires the use of data quality tools as they can handle the complexity.

For example, in South Africa name and address data is commonly stored in both English and Afrikaans language formats. So “ABC Apteek EDMS BK, Kerkstraat 12, Richterspark Uit 7, Potgietersrus” is equivalent to “ABC Pharmacy (Pty) Ltd, 12 Church Street, Richters Park Ext 7, Mokopane”. Our robust name and address parser understands and manages this kind of issue off the shelf. The parser generates standardised name and address elements – e.g. House Number, Street Name, Suburb Name, – that are used by the matcher.

Similarly, other free text data fields, such as SAP Materials descriptions, can be parsed and broken into structured, standardised elements. “Groen TYT CRLLA” is equivalent to “Toyota Corolla, Green”

Optimize your party data management with MDM solutions tailored for your needs. Explore the benefits and functionalities of mdm for party data in our comprehensive guide to efficient data management.

Conclusion

There is no magic bullet for Master Data Management. By embedding data governance data quality principles you will avoid the principle mistakes that cause most MDM projects to run heavily over budget or fail.

Understand the significance of data matching in effective data governance. Explore the concept of data matching in our article on Data matching and its role in ensuring data accuracy and consistency

It is critical to understand, in advance, that whether you wish to use an existing data source as your master or implement a master data platform, MDM has to be about the data.