statistics

Big Data are not only Big but Complex, Messy, Badly Sampled, and Creepy

In researching last weeks post, I came across this great lecture by Professor Thomas Lumley Prof Lumley discusses the role that statisticians must play in data science – in an entertaining and understandable way. Great piece for any aspiring data scientist, or the interested lay person. Data Science: Will Computer Science and Informatics Eat Our Lunch?  Mainstream…

data quality kpi

Data profiling is not data quality

For years, I have been a proponent of data quality measurement. Data quality cannot exist without management (and some would argue without governance). Meaningful data quality metrics play a critical role in managing data quality. Posts such as Don’t blow up the whale;   Changing data behaviour through KPIs, and  Accuracy, Completeness and Speed of Execution…

winning team - 1995

Creating a winning data culture

This post is inspired by Joost van der Westhuizen, who died earlier this month after  a long battle with motor neuron disease. Joost was arguably the greatest ever scrum half, and was a key member of the 1995 world cup winning Springbok team. The 1995 Rugby World Cup unified South Africans as they had never…

saints

Geocoding – When the Saints come marching in

I was recently asked to deliver a proof of concept for a prospective client who is trying to improve the accuracy of their customer address data. Previous attempts at geocoding – adding a longitude and latitude point to each address – had been extremely difficult. The process had delivered limited results and had required months…

What is the impact of poor data quality?

Poor data quality can have real impact, as discussed in posts such as Data Quality: The direct impact on profit and Data Key to AML efforts. Address data quality was the focus of t drivers for quality address data But clowns at a funeral? Not one that I came up with but the focus on…

Data Quality tools are essential

Selling the steak – not the sizzle….

Consider this figure: $136 billion per year. That’s the research firm IDC’s estimate of the size of the big data market, worldwide, in 2016. This figure should surprise no one with an interest in big data. But here’s another number: $3.1 trillion, IBM’s estimate of the yearly cost of poor quality data, in the US…