#AI: Differentiation is in the data, not the algorithm

Last week I was at the BI and Analytics conference in Sandton, talking about prerequisites for artificial intelligence.

I came across this video of a presentation done by Collibra CTO, Stan Christiaens in late 2017 in which he discusses AI, Big Data and Data Governance

He talks about a primary frustration of data scientists “I can’t find the data” and talks about how curating data through a governed data catalog helps to solve this problem.

He continues to on to talk about the emerging competence of artificial intelligence and makes a telling observation

The differentiation is in the data

When talking AI the differentiation is not in the algorithm. AI processes are open source, easily available and commonly shared. Data scientists “sit in Jupiter or Zeppelin typing in Python commands” and getting immediate results

What makes the difference, for any business, is the quality and quantity of data that is available to feed the model.

Why do you think Google has been buying data acquisition companies for years? They do this to get the proprietary data that makes all the difference!

AI and machine learning bring new data quality challenges – it is not enough to know what you have, you must also begin to understand what you are missing, as discussed in Is bias the seventh data quality metric? Bias is truly the snake in the data grass.

Download the Preciselywhitepaper Debugging Data: Why Data Quality is essential for AI and Machine Learning for a discussion on the types of data quality concerns that must be considered when embarking on your AI journey,

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.