A few weeks back I shared Ganes Kesari’s interview of Dr Tom Redman discussing the importance of data integrity for high stakes artificial intelligence.
High-stakes AI refers to the increasing use of AI and machine learning in making life and death decisions – in areas such as public health, conservation or justice.
In many use cases, occasional mistakes are ok. Who really cares if one of ten movie recommendations was a bust, as long as most of the time we get to see something that we enjoy. Most of us would simply stop watching and move on to a different channel.
But with high stakes decision making, errors can literally kill. Poor data practises, for example, meant that none of the hundreds of AI tools built to help diagnose COVID worked; Google Flu Trends missed the flu peak by 140%, and IBM’s cancer treatment AI suffered reduced accuracy.
To solve these problems at scale requires the application of good, old fashioned data cleansing. Google’s research shows that AI’s effectiveness is no longer limited by the models (algorithm) but by the quality of the data.
Poor quality data fed into AI or ML models frequently leads to multiple negative, downstream events that Google calls data cascades. These data cascades are driven by conventional AI/ML practises that undervalue data quality but are now rendering high-stakes AI useless. “Data quality carries an elevated significance in high-stakes AI due to its heightened downstream impact.”
More data compounds the problem. AI models are better built with smaller, high-quality data sets than with vast data sets of dubious or poor quality.
Data scientists and other advanced analytics specialists need support to ensure that they are supplied with high-quality datasets that accurately reflect the real-world in order to develop and train models that can safely make high-stakes decisions.
Ironically, these issues are preventable.
DataOps supports cross-functional data analytics teams, agile methodologies, and modern data management tools that enhance collaboration and data curation, capture and share a sound understanding of available data, and speed time to insight by developing a culture of data excellence. People, supported by tools, remain the key to the delivery of high-quality data for AI.
DataOps helps to ensure that data integrity is managed through the entire data lifecycle – from data creation through to maintaining live data after deployment of a model. It is this capability that ensures that models can be safely moved into production, particularly for high-stakes applications.