Data quality solves the unbearable truthiness of analytics!

Before making an important decision, get as much as you can of the best information available and review it carefully, analyze it and draw up worst case scenarios. Add up the plus or minus factors, discuss it with your team and do what your guts tell you to do.” – The Mafia Manager

Decision support, or business intelligence, has been a major driver for CIOs for many years. The current focus on “big data” and social media analytics is a logical extension of inward facing analysis capability to allow decision making using the wealth of information available on the web.

Most BI sales, and projects, are delivered on the premise that any decision is better than no decision. Decision makers need information in order to make a decision – our job is to provide them with information.

Big data’s three dimensions – volume, velocity  and variety – represent an extreme view of the challenges faced in any data analytics project.  How do we consolidate multiple data sources, with varying levels of quality, and deliver analysis that is timely and relevant?

Analytics projects are caught between the devil and a hard place. The CEO needs his report out within an hour but the data quality means that we cannot generate the numbers. So we work around the problems – maybe we ignore the records that do not hold information in the right format, or we substitute invalid value with valid ones.

We manipulate the inputs so that we can generate the outputs we need while they are still relevant!  The problem is not that we do this – it is a necessary evil of working in an imperfect world and, most of the time, it will not have a significant impact on decision making. Except when it does!

Decisions need to be made based on an evaluation of the “best information available.” It is when we present poor information as good information that the waters get muddy. Are business decisions being made based on truth, or based on truthiness?

According to Stephen Colbert,”Truthiness is what you want the facts to be, as opposed to what the facts are. What feels like the right answer, as opposed to what reality will support.”

If information has been significantly manipulated to provide a report surely the decion maker deserves to understand this? What is the level of confidence that you have in the underlying data. Was key information missing, or dd it vary widely (and inconsistently) from the previous quarters numbers? Could this have been an error in translation or amalgamation?

Legislation and regulations such as King III, Basel II and the Protection of Personal Information bill all recognize that key indicators such as Financial Forecasts or Risk Models can only be accurate within the margin of error supported by the underlying data – and seek to ensure that the public is protected against risk by penalising companies and individuals that cannot provide a measure of the quality of the data supporting key public metrics.

Common sense suggests that we need to give our business leaders similar protection – by providing a indicator of confidence on all reports linked to the quality of the supporting data. If they need to make a decision based on gut feeling surely this is better than making a decision based on truthiness?

2 thoughts on “Data quality solves the unbearable truthiness of analytics!

  1. How do you quantify the trustworthiniess of data? It can be increased or decreased by so many different factors it seems like providing management with some kind of numeric “trustworthiness” gauge would be itself an untrustowrthy piece of data.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.