Earlier this year Forbes published the results of yet another survey showing that data preparation continues to be the most time-consuming task required for data scientists

- The typical data scientist spends around an hour and a half every day collecting data sets, and another five hours cleansing data and getting it organised for analytics
- This leaves only an hour and a half for advanced analytics
This is not a surprise – multiple surveys have shown similar results.
What was unusual is that this survey asked data scientists to rate their least enjoyable tasks

- 76% of data scientists regard data preparation as the least enjoyable part of their job
- Yet they spend 80% of their time doing just this… No wonder the turnover of data scientists is so high.
#DataScientists spend 80% of their time finding and cleaning data – leading to job dissatisfaction and poor retention. #DataIntegrity
Tweet
The Importance of Data Integrity
Data governance and data quality are prerequisites for advanced analytics and data science.
When looking for data scientists we tend to look for advanced analytics proficiency – R programming, SQL, Hadoop, statistical modelling, and the like.
Arguably, enterprise information management skills such as data stewardship, data profiling and data cleansing are at least as important.
Either we need to bring these competencies into our data science team or, we need to ensure that data provided to our data scientists is of better quality as discussed in my post Earning the right to go wide
What do you think? Should data scientists be freed from cleaning data, or is this a core job function that requires more focus?

Leave a comment