Last week I was at the BI and Analytics conference in Sandton, talking about prerequisites for artificial intelligence.
I came across this video of a presentation done by Collibra CTO, Stan Christiaens in late 2017 in which he discusses AI, Big Data and Data Governance
He talks about a primary frustration of data scientists “I can’t find the data” and talks about how curating data through a governed data catalog helps to solve this problem.
He continues to on to talk about the emerging competence of artificial intelligence and makes a telling observation
The differentiation is in the data
When talking AI the differentiation is not in the algorithm. AI processes are open source, easily available and commonly shared. Data scientists “sit in Jupiter or Zeppelin typing in Python commands” and getting immediate results
What makes the difference, for any business, is the quality and quantity of data that is available to feed the model.
Why do you think Google has been buying data acquisition companies for years? They do this to get the proprietary data that makes all the difference!
AI and machine learning bring new data quality challenges – it is not enough to know what you have, you must also begin to understand what you are missing, as discussed in Is bias the seventh data quality metric? Bias is truly the snake in the data grass.
Download the Sycnsort whitepaper Debugging Data: Why Data Quality is essential for AI and Machine Learning for a discussion on the types of data quality concerns that must be considered when embarking on your AI journey,