
When Harvard Business Review first touted the data scientist as sexiest job of the 21st century back in 2012 the role was still in its infancy.
The promise of advanced analytics and the insights that businesses could gain – about their customers, their interactions, their products and everything else were rightly identified as potential gold.
Data was the new oil
Data was the new oil and data scientists were the prospectors, the drillers, the refiners and the distributors rolled into one. At the time I suggested that the data scientist function would need to be fulfilled by a group
With the benefit of hindsight, it became clear that data science is an end game. In fact, most data scientists spend more time finding and preparing data for analysis than they spend on actual analytics.
The critical skill for data science is less analytics modelling and more data preparation – good, old-fashioned, tedious data management work that has always been needed for analytics.
This data engineering capability is seen as separate from data science.
This lead me to ask, last year, “Is data science still the sexiest job of the 21st century?“
This morning I saw the following question posed on Quora, the popular Q&A platform:
Is a data engineer more in demand than data scientists?
The various responses are worth reading in their own right but broadly speaking the realization that data engineering forms more than 60% of the effort of any data science program, combined with the reality that data engineering is also the practical part of the delivery means that data engineering is overtaking data science as the primary role player in most analytics teams, and is in equally short supply,
This makes the requirement for automation of routine data engineering tasks, such as data ingestion and data cleansing even more critical.
Lessons for Machine Learning and Artificial Intelligence
In 2023, machine learning and artificial intelligence have replaced data science as the sexiest technology. According to a TDWI checklist report, aaking the jump from test and training environments to full production environments requires a smart data pipeline strategy. This includes ensuring that the right tools and processes are in place so that all the data used in model building is accessible, clean, understood and governed. It also means that the data environment needs to support operationalizing machine learning models against new and big data, which will necessitate keeping data current and involve real-time processing and automation.
Read the Checklist Repoer to learn about the data engineering challenges facing organizations that want to take advantage of machine learning and best practices for data engineering and management to support machine learning and artificial intelligence.
Businesses must learn from the past, but probably won’t.
Data engineering and basic data management capabilities are critical to every advanced analytics program.

Leave a comment