Artificial intelligence and machine learning are the new “big data‘ – the hottest topic in analytics and decision making.
The premise is that computers can think and learn, like humans, and can replace humans for many tasks.
Projects like driverless cars; the Internet of Things (IoT) – where machines and devices communicate amongst each other to get things done without human involvement; and investment advising are some examples of the pervasive reach and scope of artificial intelligence and machine learning.

In an article written for Forbes in 2016, Barnard Marr spoke of AI as a revolution that will change everything about the way we produce, manufacture and deliver.
At the same time, he spoke of some of the very real dangers that AI presents – including legal and ethical concerns along with the dangers of plunging headfirst into AI without a clear AI Governance plan and business case.
Very briefly – he touched on the reality that the fully autonomous, AI-powered, human-free industrial operation is still some way from completion.
Leadership in AI requires leadership in data quality
In a 2017 survey over half of 179 data scientists interviewed cited poor data quality as the biggest challenge hindering AI progress.
Big data is full of holes – missing, inconsistent and downright incorrect data sets that skew results – and lacking in metadata – information about the data that gives it context and makes it usable.
This means that data scientists typically spend more than 80% of their time on data engineering tasks – cleaning and preparing data to make it usable.
The challenges of managing data preparation and data quality at scale are exacerbated by the reality that big data sets increasingly combine internal data sets (such as customer, interaction, product and machine logs) over which we may have some nominal control with external publicly available data sets – such as state-sponsored demographics (census data, employment data, etc) over which we have little or no control.
What can we do?
- Recognise that not all data is equal (or valuable). Governed data catalogues allow data scientists (and other stakeholders) to both find and understand the context of data and to rate the data in terms of its value and the level of trust for a particular purpose
- Invest in automating and simplifying data preparation. Data scientists need to spend less time writing code to deliver quality data and more time building AI models.
- Data preparation must include trace-ability allowing data scientists and decision-makers to understand where data used for models comes from and any changes that have been made to it to adapt it for the model
- Data quality must be delivered at big data scale to ensure that AI models are working with the best possible foundation for sound learning and decision making.

Leave a comment