A few weeks back I shared Ganes Kesari’s interview of Dr Tom Redman discussing the importance of data integrity for high-stakes artificial intelligence.

High-Stakes AI
High-stakes AI refers to the increasing use of AI and machine learning in making life-and-death decisions – in areas such as public health, conservation or justice.
In many use cases, occasional mistakes are ok.
Who really cares if one of ten movie recommendations was a bust, as long as most of the time we get to see something that we enjoy? Most of us would simply stop watching and move on to a different channel.

Decisions that kill
But with high-stakes decision-making, errors can literally kill.
Poor data practises, for example, meant that none of the hundreds of AI tools built to help diagnose COVID worked; Google Flu Trends missed the flu peak by 140%, and IBM’s cancer treatment AI suffered reduced accuracy.
Solving these problems at scale requires the application of good, old-fashioned data cleansing. Google’s research shows that AI’s effectiveness is no longer limited by the models (algorithm) but by the quality of the data.
Data cascades in high-stakes AI
Poor quality data fed into AI or ML models frequently leads to multiple negative, downstream events that Google calls data cascades. These data cascades are driven by conventional AI/ML practices that undervalue data quality but are now rendering high-stakes AI useless.
“Data quality carries an elevated significance in high-stakes AI due to its heightened downstream impact.”
“Everyone wants to do the model work, not the data work”: Data Cascades in High-Stakes AI
More data compounds the problem.
AI models are better built with smaller, high-quality data sets than with vast data sets of dubious or poor quality.
Data scientists and other advanced analytics specialists need support to ensure that they are supplied with high-quality datasets that accurately reflect the real-world in order to develop and train models that can safely make high-stakes decisions.
Embrace DataOps
Ironically, these issues are preventable.
DataOps supports cross-functional data analytics teams, agile methodologies, and modern data management tools that enhance collaboration and data curation, capture and share a sound understanding of available data, and speed time to insight by developing a culture of data excellence.
People, supported by tools, remain the key to the delivery of high-quality data for AI.
DataOps helps to ensure that data integrity is managed through the entire data lifecycle – from data creation to maintaining live data after the deployment of a model.
It is this capability that ensures that models can be safely moved into production, particularly for high-stakes applications.
Photo by JESHOOTS.com on Pexels.com

Leave a comment