Simple tips for designing for data quality

Why is data quality essential? In today’s data-driven world, ensuring the quality of data is crucial for gaining actionable insights. Discover the significance of data quality in driving business outcomes.

Is your company struggling with the quality of data across and in your enterprise systems?

Most, if not all, data quality issues are caused by human error.

Data Capture Issues

Approximately 80% of errors are simple data capture errors – users entering the wrong information – with the balance largely arising through poor data integration.

Over the last fifteen years, we have delivered multiple data quality audits and assessments, in different environments and, based on my experience, suggest that a few simple design choices can have a dramatic impact on your ability to manage information quality at a holistic level.

**1. Plan to capture the User and Date that information was captured, or modified**.

Data profiling and discovery tools uncover interesting patterns of behaviour in your systems.

If this behaviour can be linked to specific users, groups, or time periods then it can be managed.

For example, we may identify that x% of our information has an incorrect link between supplier and product code. We can now go ahead and fix the problem but we have no real insight as to when, or why, it occurred. Data governance, and root cause analysis, require context for our information.

Date of Capture information gives you important context.

Is this an old problem that has subsequently been resolved?

System validation may have improved but we have been left with a legacy of erroneous, poor-quality records.

Or maybe the errors can be tied back to a historical event. Do these records link back to the data migration of information from the previous ERP platform into the current one?

Maybe the errors have started recently – have there been any recent system changes that may have allowed users to capture faulty records?

Similarly, User information gives you context

Can you link patterns of behaviour to specific users or teams?

Users will develop certain patterns of behaviour, or workarounds, in order to bypass system restrictions where these are considered to be onerous, or where they do not allow the task to be performed.

For example, a system may require a Client Account ID to be captured before allowing a call to be completed.

If the client does not know, or will not share, this information the call centre agent, under pressure to complete the call timeously, may capture another Client’s Account ID instead.

Patterns in behaviour by specific users, or groups of users, are a key indicator of a broken business process.

Further investigation will need to be done by the data stewards.

Maybe the problem can be tied back to overly ambitious system validations?

Do the users need training or additional support?

In many cases, these errors can be solved by education.

Do your user’s KPIs need adjustment?

Many data quality errors are caused because users are measured on the volume of data captured rather than on the quality of data captured.

Quite possibly there will be a combination of some or all of these factors.

Designing with data quality in mind means giving context to errors! Remember that data quality plus context leads to data integrity.

You may want to add additional information to your systems.

2. Use a “soft” delete / merge

Another issue we may uncover in your information is that of so-called “orphan records” – records that have lost their partner.

Two simple examples – are a delivery note that does not have a delivery address or an order that does not have a customer.

In some cases, these records are simply captured incorrectly – the user accidentally types in a non-existent customer number.

In this case, you can do a root cause analysis as per point 1.

Referential Integrity Issues

However, in many cases, this issue is caused by one of the records being deleted after the event.

Your user linked the order to an existing customer and, later, another user deleted the customer record.

Best practices for deleting and merging records

Deletion and merging are important tools for managing data integrity.

If you want to reduce faulty or duplicate records you must give users the tools to sort out these issues.

A deletion is used when a record is no longer relevant.

There can be a number of good business reasons to delete a record – for example, a legal requirement to cease doing business with a particular client. A so-called soft delete provides you with a means to treat the record as deleted, without losing any information.

A “soft delete” means that, instead of physically removing the record from the underlying database, the record is marked as deleted. This means that users will not be able to access or use that record, but that it will still be available for audit purposes.

A merge is used when you identify that two or more records exist for the same entity. This is an extremely common problem, most efficiently picked up through the use of automated data cleansing and matching tools.

For example, the supplier records for “Mr J Bloggs, CDO at Widgets Co” and “Joseph P. Bloggs, Chief Data Officer, Widgets Company Inc.” represent the same supplier.

In order to clean up our system we need to merge these records to create a single, unified supplier and contact record.

A soft merge would link both records via a common key, allowing us to maintain the integrity of all linking transactions, before soft deleting all but one of the set.

Your system should be designed to facilitate soft deletes and soft merges.

Plan to allow the addition of linking keys to group similar or related records, as well as for the use of a soft delete.

When used in combination with a data quality metric program these simple tips provide a solid foundation to solve most data quality issues.

What role must business users take for data quality? Learn about the crucial role business users play in ensuring data quality and ownership in this insightful article.

A guide to Data Quality Root Cause Analysis: Investigate the process of Data Quality Root Cause Analysis to identify underlying causes affecting data quality issues.

Response to “Simple tips for designing for data quality”

Eight insights for Fantastic Dates | Data Quality Matters

August 5

[…] The lack of an audit date. In Simple tips for designing for data quality I discussed how dates provide valuable inputs for root cause analysis. It is important to understand […]