Many large system implementations are driven due to a perception (fueled by the sales team) that the new system will miraculously address data quality issues (in the existing system) that inhibit business’ ability to function effectively.
We have, for example, seen a number of client replacing existing billing systems with large ERP packages. Similarly, a lot of technology driven Master Data Management projects are driven by a desire to create a clean record for an entity – be this client, supplier, product or whatever.
The irony is that, almost without fail, the system integrator excludes data take on (and cleaning) from the project plan – this remains the responsibility of the client. So we find situations like that of the Johannesburg municipality – an award winning SAP implementation that delivered to the project plan. Yet after spending a reputed R1billion (US$150million) to address billing issues the city is now even worse off – being owed an estimated R12billion by that cannot be accurately traced, is in dispute, or will have to be written off. SAP denies that the system is the problem. The City agrees that the problems can be traced to data quality issues related to the data take on and integration of data from various disparate systems into SAP.
So why were these challenges overlooked? Well, the City is not alone. Various studies link data quality challenges to the failure of almost 80% of big IT projects – ERP, CRM, MDM and Data Warehousing projects on reliant on accurate data. In most cases, data migration is ignored or left until last – a recipe for failure.
Rather than leaving data quality and data integration challenges to the “end” of the project, and then discovering that substantial design changes and redevelopment are required in order to address these, project teams must plan for Data Integration from Day 1. It is business’ responsibility to ensure that the chosen supplier is not allowed to cop out of the data take on challenge! Surely a successful project must include data that is fit for purpose?
A cross functional (business and technical) team should be established early as a stream within the broader project team. This team should be responsible for identifying and documenting the required to be state of data in the new system.
Each existing system should be assessed for data quality and fitness for purpose – an automated data profiling and discovery tool such as Trillium Software Discovery will enable business users to drive the process, will shave years off the process (for a big project) and will ensure all significant issues are discovered early.
The team can then do a source to target mapping for each system – ensuring that the final design is compatible with existing data, and planning for remediation steps that will need to take place during the data migration.
These may be simple conversions e.g. of data formats, or data types (that can be done by any ETL tool or even SQL scripts). Where data sets are not organised and structured e.g. free text fields such as name, address or Material descriptions; or where clear data maps (attribute to attribute) are not available then the ETL tool may not be up to the challenge – a specialist data cleansing tool that provides context sensitive parsing and fuzzy matching across fields will be required. Manual remediation efforts will also need to be identified and planned.
Best practise suggests that data should be cleaned (as best as possible) within each source, before being consolidated into the new system. Automation means that the cleansing processes used for each source can be applied to the data migration batch process and can continue to be applied to the new system (as real time cleansing and matching services) to maintain data quality going forward – otherwise you run the risk of data quality degrading rapidly even if the data take on was successful.
The worst decision you can take is to simply load all data from all sources. This will propagate all existing data issues into the new system, and add additional issues such as duplicate data and multiple conflicting standards.
The most cost effective time to plan for data quality is early – the opportunity should not be wasted – and the risk of not doing this must be clearly communicated to the project sponsors!