Data Migrations good opportunity to improve data quality

Many large system implementations are driven due to a perception (fueled by the sales team) that the new system will miraculously address data quality issues (in the existing system) that inhibit business’ ability to function effectively.

We have, for example, seen a number of client replacing existing billing systems with large ERP packages. Similarly, a lot of technology driven Master Data Management projects are driven by a desire to create a clean record for an entity – be this client, supplier, product or whatever.

The irony is that, almost without fail, the system integrator excludes data take on (and cleaning) from the project plan – this remains the responsibility of the client. So we find situations like that of the Johannesburg municipality – an award winning SAP implementation that delivered to the project plan. Yet after spending a reputed R1billion (US$150million) to address billing issues the city is now even worse off – being owed an estimated R12billion by that cannot be accurately traced, is in dispute, or will have to be written off.  SAP denies that the system is the problem. The City agrees that the problems can be traced to data quality issues related to the data take on and integration of data from various disparate systems into SAP.

So why were these challenges overlooked? Well, the City is not alone. Various studies link data quality challenges to the failure of almost 80% of big IT projects – ERP, CRM, MDM and Data Warehousing projects on reliant on accurate data. In most  cases, data migration is ignored or left until last – a recipe for failure.

Rather than leaving data quality and data integration  challenges to the “end” of  the project, and then discovering that substantial design changes and redevelopment are required in order to address these, project teams must plan for Data Integration from Day 1. It is business’ responsibility to ensure that the chosen supplier is not allowed to cop out of the data take on challenge! Surely a successful project must include data that is fit for purpose?

A cross functional (business and technical) team should be established early as a stream within the broader project team. This team should be responsible for identifying and documenting the required to be state of data in the new system.

Each existing system should be assessed for data quality and fitness for purpose – an automated data profiling and discovery tool such as Trillium Software Discovery will enable business users to drive the process, will shave years off the process (for a big project) and will ensure all significant issues are discovered early.

The team can then do a source to target mapping for each system – ensuring that the final design is compatible with existing data, and planning for remediation steps that will need to take place during the data migration.

These may be simple conversions e.g. of data formats, or data types (that can be done by any ETL tool or even SQL scripts). Where data sets are not organised and structured e.g.  free text fields such as name, address or Material descriptions; or where  clear data maps (attribute to attribute) are not available then the ETL tool may not be up to the challenge – a specialist data cleansing tool that provides context sensitive parsing and fuzzy matching across fields will be required. Manual remediation efforts will also need to be identified and planned.

Best practise suggests that data should be cleaned (as best as possible) within each source, before being consolidated into the new system. Automation means that the cleansing processes used for each source can be applied to the data migration batch process and can continue to be applied to the new system (as real time cleansing and matching services) to maintain data quality going forward – otherwise you run the risk of data quality degrading rapidly even if the data take on was successful.

The worst decision you can take is to simply load all data from all sources. This will propagate all existing data issues into the new system, and add additional issues such as duplicate data and multiple conflicting standards.

The most cost effective time to plan for data quality is early – the opportunity should not be wasted – and the risk of not doing  this must be clearly communicated to the project sponsors!

6 thoughts on “Data Migrations good opportunity to improve data quality

  1. Some great points raised, thanks for sharing.

    One point I would add is that immediately after a migration is the perfect time to launch an ongoing data quality management initiative:

    * The right documentation and rules exist on the target system
    * The right people are in the same vicinity
    * The right drive (hopefully) for data quality pervades the management and sponsors
    * The right equipment and technology are typically still kicking around

    All makes sense but I’ve only see a few companies take this leap and every one of them found it hugely beneficial.

    1. Dylan – you are absolutely correct. However, in my experience many organisations do not have the springboard to achieve this because they haven’t addressed data issues at all. So they end up with the same mess they were trying to replace – having spent hundreds of millions and years of effort.

  2. One wonders why an organisation would be wanting to implement a new system solution? It cannot be because the current system is producing consistently accurate, timely and expected results? There are already issues of data inconsistencies, and the data “cannot be trusted” for decision making purposes and yet the data migration activity in most projects is aimed at simply pumping the old data into the new system – occassionally after addressing some of the obvious errors!

    Data is often only mentioned in the implementation project plans as the “data migration” and generally too close to the end of the project. In my experience often by the time the project gets to the data migration any budget that may have been allocated to any type of data quality initiative has also by this stage been channelled into other project activities, so now there is also no budget!

    It is interesting that there is often a “governance program” included in the initiation of a change management program to “manage the project” but again there are very few (if any) projects that specifically focus on establishing or confirm the existence of a data governance infrastructure required to:
    – manage the data management functions, such as modelling, mapping, metadata, etc., from the existing systems to the new system
    – determine and agree the required levels of data quality and the expected results.

    Instead the questions get asked when it is too late when the new solution is already implemented, the data migrated and the expected results have not been achieved! At this point costly change projects are implemented in panic mode to try to “plug the holes” and correct the issues and still no thought is given to proper governance and accountability.

    Perhaps all implementation programs (methodologies) should include the establishment of data governance and data quality forums with, as suggested by Dylan, ongoing activities planned to reassess and evaluate data quality during and after the project.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.