Monkeys, bananas and machine learning

There is an old story about an experiment involving nine monkeys. Four monkeys are placed in a cage. Every day, a plate of fresh fruit is placed into the cage. As the monkeys reach for the food the keepers come in and beat them. Over the course of a few weeks the monkeys learn not…

Good data or bad data – do you really care

A short post this week – I am still travelling Bias in decision making can trump data – as discussed in Is Bias the 7th data quality metric –  and Is High Cholesterol bad?   On the global stage – the analytics that predicted a Hillary Clinton victory in the 2018 US election was, in…

The Impact of Poor Data Quality on Machine Learning

We are surrounded by huge amount of data. Data is everywhere and is gaining huge importance and relevance in today’s world. There are many firms that are performing tasks of gathering, retrieving and managing data. This requires systems that can help us handle that much amount of data. Machine Learning has helped us in gathering…

Big data or big disaster?

When I first started posting about big data, very few users existed in South Africa. Today, most last organisations have a Hadoop data lake – in many cases replacing traditional ETL and/or acting as a data archive as well as a feeder to the enterprise data warehouse and, various operational data marts. In a few,…

The future of business

When people think about the future of business, it’s often in terms of new technologies, such as artificial intelligence, virtual reality, or other concepts that even a few years ago would seem like impossible science fiction. But amidst all the glitz and glamour of a hyper-connected world, “the boring stuff” often goes unnoticed. The day-to-day science…

gender bias

Is “Bias” the 7th big data quality metric

A few weeks back I wrote about the The 6 dimensions of big data quality. These are: Coverage – how well does the data source meet (or fail to meet)  the business need? Continuity – How well does the data set cover all expected or needed intervals? Triangulation – How consistent is data when measured form…