Explore the world of high-stakes AI and the critical role of data quality. Learn how poor data practices can have life-and-death consequences, and discover the importance of DataOps in ensuring the integrity of AI decision-making.


A few weeks back I shared Ganes Kesari’s interview of Dr Tom Redman discussing the importance of data integrity for high-stakes artificial intelligence.

navigating high-stakes artificial intelligence

High-Stakes AI

High-stakes AI refers to the increasing use of AI and machine learning in making life-and-death decisions – in areas such as public health, conservation or justice.

In many use cases, occasional mistakes are ok.

Who really cares if one of ten movie recommendations was a bust, as long as most of the time we get to see something that we enjoy? Most of us would simply stop watching and move on to a different channel.

watching movies

Decisions that kill

But with high-stakes decision-making, errors can literally kill.

Poor data practises, for example, meant that none of the hundreds of AI tools built to help diagnose COVID worked; Google Flu Trends missed the flu peak by 140%, and IBM’s cancer treatment AI suffered reduced accuracy.

Solving these problems at scale requires the application of good, old-fashioned data cleansing. Google’s research shows that AI’s effectiveness is no longer limited by the models (algorithm) but by the quality of the data.

Data cascades in high-stakes AI

Poor quality data fed into AI or ML models frequently leads to multiple negative, downstream events that Google calls data cascades. These data cascades are driven by conventional AI/ML practices that undervalue data quality but are now rendering high-stakes AI useless.

“Data quality carries an elevated significance in high-stakes AI due to its heightened downstream impact.”

“Everyone wants to do the model work, not the data work”: Data Cascades in High-Stakes AI

More data compounds the problem.

AI models are better built with smaller, high-quality data sets than with vast data sets of dubious or poor quality.

Data scientists and other advanced analytics specialists need support to ensure that they are supplied with high-quality datasets that accurately reflect the real-world in order to develop and train models that can safely make high-stakes decisions.

Embrace DataOps

Ironically, these issues are preventable.

DataOps supports cross-functional data analytics teams, agile methodologies, and modern data management tools that enhance collaboration and data curation, capture and share a sound understanding of available data, and speed time to insight by developing a culture of data excellence.

People, supported by tools, remain the key to the delivery of high-quality data for AI.

DataOps helps to ensure that data integrity is managed through the entire data lifecycle – from data creation to maintaining live data after the deployment of a model.

It is this capability that ensures that models can be safely moved into production, particularly for high-stakes applications.

Photo by JESHOOTS.com on Pexels.com

Discover more from Data Quality Matters

Subscribe now to keep reading and get our new posts in your email.

Continue reading