
There’s a shortage of Marmite in South Africa.
The savoury spread is a by-product of beer production – which has been disrupted by repeated bans during COVID-19 peaks.
The goal has been to reduce alcohol related trauma cases in already overwhelmed trauma units. Who would have seen a shortage of Marmite as an outcome?
Another unexpected casualty has been artificial intelligence
Hundreds of AI tools have been built to catch covid. None of them helped.
COVID-19 plunged the planet into a health crisis that was unprecented and, as a result, poorly understood,
AI teams around the world stepped in – building hundreds of predictive tools intended to help hospitals diagnose or triage COVID patients faster.
Yet, in spite of this massive investment none of them helped. In fact, some were potentially harmful.
Garbage in, garbage out
Numeorus studies show that researchers across the world repeated the same basic errors in the way that they trained and tested their models.
At the time that these models were being built only public data sets were avaialble. In many cases these were porrly labelled, or from unknown sources.
As a result, duplicate data sets were in some instances consolidated, skewing outcomes. In others, the same data was used to both develop and test models, making them appear more accurate. Some patient data was included that had nothing to do with COVID – again training the models to make invalid conclusions.
A more insidious and subtle problem were the biases built into some data sets – for example, some data sets were labelled as COVID diagnoses causing their daignostics to be weighted more heavily in the models.
Is AI dead?
Marmite (and beer) are making a comeback, and, I am pleased to say, so will AI.
What lessons can we learn?
When you consider the role of data a thorny problem emerges. On the one hand, it is clear that having as much data as possible that is of high quality will make AI and machine learning algorithms work better. But it is also clear that because the signal is hidden deep inside the data and can only be revealed by algorithms, it is not always straightforward to see how we can clean such data to improve its quality without obscuring the signal.
For #AI the differentiator is in the data, not the algorithm. Many of the issues uncovered in these models come down to a poor understanding of the data, and its quality.
Basic data management principles – metadata management, collaboration and data quality would have helped to address many of the issues identified by providng context and standards to data sets. One outcome of these studies is an initiative by the World Health Organisation to implement a global data-sharing contract for use during future emergencies of this nature,
AI is not dead.
But, ultimately, it will continue to struggle to deliver qualiity results unless we address the unsexy data management foundations.
Download this Precisely white paper to learn why the process of identifying biases present in the data, is an essential step towards debugging the data that underlies machine learning predictions and most importantly, improves data quality