datacitizens18

My key take aways from #DataCitizens18

For three days last week, New York City became the #datacapital of the world. Data leaders and data citizens from 29 countries and every industry gathered for #datacitizens18. What were some key themes? 1.) Is data governance dead? As governance becomes integral to every data oriented scenario – analytics, AI, compliance, privacy and more –…

The Impact of Poor Data Quality on Machine Learning

We are surrounded by huge amount of data. Data is everywhere and is gaining huge importance and relevance in today’s world. There are many firms that are performing tasks of gathering, retrieving and managing data. This requires systems that can help us handle that much amount of data. Machine Learning has helped us in gathering…

data citizens

Previewing Collibra Data Citizens 2018

As you read this I will be on my way to New York City ahead of next week’s Collibra Data Citizen’s conference This is the third year that the event is being held, and it is rapidly becoming the event of the year for organisations that are serious about their data It is great that…

Big data or big disaster?

When I first started posting about big data, very few users existed in South Africa. Today, most last organisations have a Hadoop data lake – in many cases replacing traditional ETL and/or acting as a data archive as well as a feeder to the enterprise data warehouse and, various operational data marts. In a few,…

The future of business

When people think about the future of business, it’s often in terms of new technologies, such as artificial intelligence, virtual reality, or other concepts that even a few years ago would seem like impossible science fiction. But amidst all the glitz and glamour of a hyper-connected world, “the boring stuff” often goes unnoticed. The day-to-day science…

gender bias

Is “Bias” the 7th big data quality metric

A few weeks back I wrote about the The 6 dimensions of big data quality. These are: Coverage – how well does the data source meet (or fail to meet)  the business need? Continuity – How well does the data set cover all expected or needed intervals? Triangulation – How consistent is data when measured form…