Simple tips for designing for data quality


Is your company struggling with the quality of data across and in your enterprise systems.

Most, if not all, data quality problems are caused by human error.

Approximately 80% of errors are simple data capture errors – users entering the wrong information – with the balance largely arising through poor data integration.

Over the last fifteen years I have delivered multiple data quality audits and assessments, in different environments and, based on my experience, suggest that a few simple design choices can have a dramatic impact on your ability to manage information quality at an holistic level.

1. Plan to capture the User and Date that information was captured, or modified.

Data profiling and discovery tools uncover interesting patterns of behaviour in your systems. If this behaviour can be linked to specifics users, groups, or time periods then it can be managed.

For example, we may identify that x% of our information has an incorrect link between supplier and product code. We can now go ahead and fix the problem but we have no real insight as to when, or why, it occurred. Data governance, and root cause analysis, require context for our information.

  • Date of Capture information gives you important context.

Is this an old problem that has subsequently been resolved?

System validation may have improved but we have been left with a legacy of erroneous, poor quality records.

Or maybe the errors can be tied back to a historical event. Do these records link back to the migration of information from the previous ERP platform into the current one?

Maybe the errors have started recently – have there been any recent system changes that may have allowed users to capture faulty records?

  • Similarly, User information gives you context

Can you track patterns of behaviour to specific users or teams?

Users will develop certain patterns of behaviour, or work around, in order to bypass system restrictions where these are considered to be onerous, or where they do not allow the task to be performed.

For example, a system may require a Client Account ID to be captured before allowing a call to be completed. If the client does not know, or will not share, this information the call centre agent, under pressure to complete the call timeously, may capture another Client’s ID instead.

Patterns in behaviour by specific users, or groups of users, are a key indicator of a broken business process.

Further investigation will need to be done by the data stewards.

Maybe the problem can be tied back to overly ambitious system validations?

Do the users need training or additional support? In many cases, these errors can be solved by education.

Do your user’s KPIs need adjustment? Many data quality errors are caused because users are measured on volume of data captured rather than on quality of data captured.

Quite possibly there will be a combination of some or all of these factors.

Designing with data quality in mind means giving context to errors!

You may want to add additional information to your systems.

2. Use a “soft” delete / merge

Another issue we may uncover in your information is that of so-called “orphan records” – records that have lost their partner.

Two simple examples – a delivery note that does not have a delivery address, or an order that does not have a customer.

In some cases, these records are simply captured incorrectly – the user accidentally types in a non-existent customer number.

In this case, you can do root cause analysis as per point 1.

However, in many cases this issue is caused by one of the records being deleted after the event. Your user linked the order to an existing customer and, later, another user deleted the customer record.

Deletion and merging are important tools for managing data integrity. If you want to reduce faulty or duplicate records you must give users the tools to sort out these issues.

A deletion is used when a record is no longer relevant. There can be a number of good business reasons to delete a record – for example, a legal requirement to cease doing business with a particular client. A so-called soft delete provides you with a means to treat the record as deleted, without losing any information.

A soft delete means that, instead of physically removing the record from the underlying database, the record is marked as deleted. This means that users will not be able to access or use that record, but that it will still be available for audit purposes.

A merge is used when you identify that two or more records exist for the same entity. This is an extremely common problem, most efficiently picked up through the use of automated data cleansing and matching tools.

For example, the supplier records for “Mr J Bloggs, CDO at Widgets Co” and “Joseph P. Bloggs, Chief Data Officer, Widgets Company Inc.” represent the same supplier.

In order to clean up our system we need to merge these records to create a single, unified supplier records.

A soft merge would link both records via a common key, allowing us to maintain the integrity of all linking transactions, before soft deleting all but one of the set.

Your system should be designed to facilitate soft deletes and soft merges.

Plan to allow the addition of linking keys to group similar or related records, as well as for the use of a soft delete.

When used in combination with a data quality metric program these simple tips provide a solid foundation to solve most data quality issues.

Advertisements

One thought on “Simple tips for designing for data quality

  1. Pingback: Eight insights for Fantastic Dates | Data Quality Matters

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s