The recent question posed on the IAIDQ community was largely answered with “It doesn’t matter because IT and business staff don’t see a difference.”
However, Dave Silverstein‘s comment used an interesting analogy.
To paraphrase – if you walk into a wood you see trees – these can be likened to pure data. From an aerial photograph you can get an idea of the extent of the wood. You make an estimate of the number of trees per hectare and calculate the return in lumber – you get information – however you no longer see the individual trees. As data is translated into information you are losing detail – and as a result the potential for error increases.
The analogy is a good one in that lumber is a long term investment – just like data. And data that is not maintained will deteriorate over time, just like the wood.
After a year, our young wood may start to show signs of a fungal infection. Our information won’t change – but one in ten trees is beginning to decay from the inside – so will no longer make good lumber.
If the problem is ignored the infection will spread – more trees will be destroyed and our information, although it hasn’t changed will no longer be of good quality. Initially our information will be 90% accurate – over time it may have no bearing on reality as the wood, which appears healthy from the air is rotten to the core. Our information quality may be terrible – but we won’t know and will not be able to make appropriate decisions to save the wood (and the long term lumber position)
At some point we may, by chance, take a walk through the wood. We identify the diseased trees but it is too late to save them. We implement a cleansing program – cutting and burning the infected specimens. We can adapt our calculation based on the ten percent loss and our information will remain of a good quality. This is similar to many tactical data cleansing projects – we improve the immediate quality of the data through a once off action – and this ripples through to the information quality. However, like most data cleansing projects, the task was labour intensive, it is not repeatable, and the next crisis that hits the wood may not be identified.
On the other hand we may make a concerted attempt to manage the wood. We take regular walks through the woods and catch signs of disease early. We can take remedial action early and repeatedly to ensure that the wood remains healthy and that our forecast lumber yield remains accurate. Similarly, to maintain data quality we need a plan. Regular data quality audits – combined with automated data cleansing in batch and real time – will allow us to catch and correct data quality issues before they become endemic in our data.
By managing our data quality our information quality takes care of itself.