Data profiling is not data quality

For years, I have been a proponent of data quality measurement.

data quality kpiData quality cannot exist without management (and some would argue without governance).

Meaningful data quality metrics play a critical role in managing data quality.

Posts such as Don’t blow up the whale;   Changing data behaviour through KPIs, and  Accuracy, Completeness and Speed of Execution have different takes on this theme.

In recent years more corporations are understanding the importance of metrics in delivering better quality data.

Indeed, we are currently busy with a data metrics project for a listed South African bank – in this case to support, amongst other drivers, their BCBS 239 compliance program.

Without metrics they cannot prove compliance. More importantly, without metrics they have no meaningful way of prioritising remediation work, they have no understanding of what is causing poor quality data to be captured and they cannot plan to improve the quality of data.

Yet, as important as they are, data quality metrics do not deliver quality data!

While you cannot manage something with measuring it, measuring it is NOT managing it.

Companies need to look beyond metrics to understand how they will deal with the issues identified.

Some issues may require manual remediation.

Who will deal with the issues raised, how will they be prioritised, tracked and monitored, how will you analyse the root cause of repeating issues, etc. These are data governance problems that require appropriate structures and a data issue management system.

Some issues must be resolved programmatically.

How will you correct common errors, standardise common fields such as phone numbers, scrub invalid values, add missing information, or match and resolve duplicate records?

Leaving these issues for manual intervention is counterproductive! More errors will be made during the remediation process, the issues will overwhelm your operational staff, and the problems will reoccur in  the future.

Far to often, technology choices are made based on the ability of the IT department to deliver metrics. To solve data quality problems, IT and business must consider an end to end data quality capability. This will be much more than the ability to deliver metric.