The “infinite monkey” approach to data cleansing…


The old myth states that if an infinite number of monkeys were given typewriters and allowed to bash away at random, one of them would eventually author the complete works of Shakespeare. The myth ignores the infinite amount of trash that would be generated.

Yet many organisations are prepared to approach data cleansing using an infinite monkey approach – employing a army of cheap, temporary labour to manually validate and correct data errors. Considering that almost all data quality issues are created by data capture error this approach is doomed to fail – how many new errors are created for every error corrected? How do you know that you have a better result than you started with? How do you ensure a consistent application of business rules across the whole team? How do you ensure the process can be repeated next week, next month or in a year’s time – when you need to do the job again.

To maintain consistent and accurate data a less casual approach is suggested:

1.) Start to think about Data Governance. What are the rules that should be applied to data? Do they vary across applications or departments? What is your current state of Data Quality – has it been measured? What is your desired state? How will you get from the current state to the desired state? How will you maintain these rules? Who else in your organisation is having similar thoughts? How can you work together?
2.) Maximise automated data validation. For restricted value fields you can use drop down list or similar coding techniques to cut errors through data entry. For re-use across multiple applications you may want to consider a Data Quality Centre of Excellence, where Data Cleansing and Matching rules  can be created for the organisation and published as services for use by any application. And for parsing and interpreting free-format text fields this is the only long term solution.

3.) Consider the use of Business Process Guidance (BPG) solutions to ensure that users capture data as accurately as possible. BPG systems provide users with accurate guidance on required data quality standards ( that may be dictated by legislation or by your data governance forum) and translate these into system specific capturing recommendations – ensuring consistency and improving accuracy.

Advertisements

2 thoughts on “The “infinite monkey” approach to data cleansing…

  1. Pingback: What are the ten biggest data quality issues you face? « Data Quality Matters

  2. Pingback: The ten biggest data quality issues you face? - Kserokopiarki Kopiarki Wielofunkcyjne

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s