Hitting the Data Quality home run!

Discover the Winning Play: Achieving Data Quality Excellence | Learn how to hit a data quality home run! Explore the four essential bases of data quality – Profiling, Monitoring, Cleansing, and Deployment. Unlock the value of your enterprise information asset with expert insights.

In baseball, a home run (abbreviated HR, also “homer”, “dinger”, “bomb”, or “four-bagger”) is scored when the ball is hit in such a way that the batter is able to circle the bases and reach home safely in one play without any errors being committed by the defensive team in the process

For a data quality home run I would suggest that we need our data quality batter to move through all four data quality bases in order to achieve a data quality improvement.

Ensure data reliability and accuracy by integrating comprehensive data quality assurance protocols into your data management framework.

So what are the four data quality bases?

First data quality base – Data Profiling

The first step of any data quality process must be to compare our data to our agreed ideal data set.

We do this by profiling our data set – measuring compliance of our data to our agreed standards and rules.

Basic data profiling can be done using SQL or other programming languages.

However, the advanced data profiling and discovery capabilities of tools like Trillium Software Discovery put data profiling into the hands of the business data steward – while providing more detailed insights more quickly than code-based approaches are able to.

This means that our business data stewards can quickly identify anomalies in data that require intervention, make decisions, and act on them without requiring IT support.

Second data quality base – Monitoring

“You can’t manage what you can’t measure”
Peter Drucker

The second requirement for a successful data quality deployment is the ability to monitor changes to data quality as improvements kick in. Various levels of detail are needed – from governance dashboards to detailed exception reports or issue logs for hands-on stewards and operational staff.

Data quality reports allow us to measure the value being driven in the form of data quality improvements, allow us to focus on problem areas that need more attention, and provide input for root cause analysis.

Third data quality base – Cleansing

Our steward must now act on his decisions!

Data issues must be allocated to a team for resolution.

Some issues may require manual remediation. A data quality case must be opened and managed to ensure these issues are addressed at source.

Other issues may be resolved through an automated cleaning process. Stewards should be able to quickly and easily define rules to standardize, enrich, and match data.

Wherever possible stewards should reuse the work done to get to first base – the data profiling and analysis – to ensure that rules can be deployed timeously and that rules defined by the data steward are not lost in translation by any technical implementation.

Fourth data quality base – Deployment

Rules must now be deployed – either as part of a batch ETL process – or, more frequently, these days, as real-time data quality services.

Batch data quality processes existing, dirty data. Real-time services ensure that new data entering our organisation is cleaned on entry.

An enterprise data quality solution must allow business stakeholders to develop data quality processes that can easily be deployed in both real-time and batch.

We must cater for the various core components in our architecture – maybe we are running SAP for ERP, Microsoft for CRM and have various legacy applications on the mainframe. We may have an enterprise bus from one vendor and an ETL tool from another.

The enterprise data quality platform should provide ease of integration with any and all of these components.

In my experience, far too many data quality implementations get stuck on the first or second base.

If your strategy, or your technology platform, does not easily support all four steps, the chances are that you will struggle to deliver quality data.

Build a robust data quality framework blueprint to guide your organization towards sustainable data management practices.

Transform your data landscape with four essential steps to transforming data into a valuable asset for your organization.

Image sourced from http://en.wikipedia.org/wiki/Chase_Utley