
Focus on Data Science
I recently attended the first Africa Data Forum in Johannesburg – a three-day event focusing on data science and skills development.
Data governance a key theme
Data governance was a key theme of the first day of the event.
Research shows that data scientists typically spend an average of 60% of their time simply trying to find and prepare the right data sets for their analysis. Data governance has proven application in reducing this “wasted” time.
What makes “data science” science?
A number of the discussions turned to the “science” aspect of data science.
Scientific method
The scientific method suggests that one forms a hypothesis, and proves, or disproves this – in the data scientist’s case through the analysis of (big) data.
This has been discussed in our post Data Scientists must see the story behind the data.
Peer review
Yet another aspect of “real” science is that papers/proofs are published and available for peer review. Both the method and, in many cases, the data, must be made available so that the scientist’s peers may test his, or her, hypothesis, calculations and conclusions.
Without these rigorous checks and balances, something like this may happen:
The data scientist is typically working on hypotheses that are to be used for competitive differentiation. Their results cannot be published in trade journals.
Trust in the results
Yet, some kind of trust must be determined in the outcomes.
Principles such as data integrity, data traceability and consistent definitions of terms, driven by engagement and governance from the appropriate subject matter experts could go a long way towards ensuring that data science results are reliable and trustworthy?
Could it be that data governance will be the peer review process for data science?

Leave a comment