Haz you too much data?

scrabble-cat-haz-is-not-a-wordIn data-driven blues, Dr Barry Devlin debunks a couple of common big data myths – including the myth that big data analysis means collecting and analysing all data (or even all relevant data).

Barry’s article, which is well worth reading, focuses on the psychological aspects of decision making. He makes the case that intuition remains an important aspect of decision making – particularly given the information overload that currently exists.

At the same time he recognises that businesses that do make better use of information, and can get better insights should do so.

How do we avoid information overload?

A large percentage of the exabytes of data being generated every day is of course irrelevant pictures of cats (see exhibit A) , inane youtube videos and similar clutter.

Even when data has more potential to be relevant, it can be extremely difficult to filter the signal from the noise.

We need to start our filtering by understanding the intent of our analytics.

If our goal, for example,  is to get insights on reducing our marketing costs this will give us a clear set of sources – our CRM, our web logs, our online advertising data, etc. Can we filter by time period, or location, or by campaign. By applying common sense we can identify more relevant data even before beginning any kind of analysis.

Prepare with purpose

Most big data case studies show that data preparation – identifying, quantifying and filtering the noise to get to the insight takes a disproportionate amount of time – between 60% and 80% of the analytics effort is spent on preparation.

A lot of this work is simply applying the filters and logic already agreed. Of course, we may adapt our preparation based on insights discovered during the process – analytics and preparation should be tightly integrated – but our purpose will guide which data must be kept and which can be discarded.

We should also, at this stage, be making decisions about how and when we will update data sets. Do we want to keep all data? Will we replicate data? Will we set up a rolling 30 day window?

How will we identify which data is available, what the relationships are between data, what filters have been applied, and even who may access it?


Data governance is not typically the first thing that data scientists think about. Yet, if we do not plan and implement policies for the management of our big data environment we will be creating a problem for ourselves that will be very difficult to resolve in a year or two.

Data governance should aligns business and data management strategies to ensure that big data is not too big!

Insight not reporting

Finally ask yourself the question – who is the audience for my reporting. The ability to slice and dice and experiment and drill down into various data sets is a feature of many traditional BI tools. At the executive decision making level however, less is more. Barry makes the very relevant point that information overload is a huge problem for most executives.

For many executives a simple answer to a key question may be all that is needed. Of course, this will depend on the executive and you may need supporting data in the event that the recommendation is questioned.

But simple answers delivered quickly may play a more important role in supporting decision making than complex dashboards.

Think about your audience all the way from preparation to visualisation if you want to show value.

Image sourced from https://polination.wordpress.com/2013/10/25/scrabble-cats/




Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.