Earlier this year Forbes published the results of yet another survey showing that data preparation continues to be the most time consuming task required for data scientists
- The typical data scientist spends around and hour and a half every daya collecting data sets, and another five hours cleansing data and getting it organised for analytics
- This leaves only an hour and a half for advanced analytics
This is not a surprise – multiple surveys have shown similar results.
What was unusual is that this survey asked data scientists to rate their least enjoyable tasks
- 76% of data scientists regard data preparation as the least enjoyable part of their job
- Yet they spend 80% of their time doing just this…
Data governance and data quality are prerequisites for advanced analytics and data science.
When looking for data scientists we tend to look for advanced analytics proficiency – R programming, SQL, Hadoop, statistical modelling, and the like.
Arguably, enterprise information modelling skills such as data stewardship, data profiling and data cleansing are at least as importance.
Either we need to bring these competencies into our data science team or, we need to ensure that data provided to our data scientists is of better quality as discussed in my post Earning the right to go wide
What do you think? Should data scientists be freed from cleaning data, or is this a core job function that requires more focus?