
Implementing robust systems to ensure data quality is imperative for data-driven decision-making processes.
There is an old story about an experiment involving nine monkeys.
Previous experience
Four monkeys are placed in a cage.
Every day, a plate of fresh fruit is placed into the cage. As the monkeys reach for the food the keepers come in and beat them. Over the course of a few weeks, the monkeys learn not to touch the fruit.
The Introduction of Bias
Now one of the monkeys is replaced with a new monkey.
This new monkey is happy to reach for food. Fearing retribution, the other monkeys attack him.
Soon, this monkey also learns not to touch the fruit.
At this point, another monkey is replaced, and the process repeats.
By the time the ninth monkey is introduced, he is being attacked by four monkeys that have never been beaten by the keeper.
How is this relevant to machine learning?
Like monkeys, machines learn based on prior experience, in this case, represented by data.
This makes machines bad at suggesting new, better ways of doing things. The inherent bias in the data, missing or inaccurate data, and other data quality issues will lead machines to learn bad habits, or to make poor recommendations.
How does data drift affect your machine learning model?: Understand the impact of data drift on the performance and reliability of your machine learning models.
How to define data quality: Understanding data quality is fundamental for any organization striving for accurate and reliable information.
Big data quality is a prerequisite for value from machine learning.

Leave a comment