Data engineering is the term that has emerged to describe the tasks related to delivering useful data for analytics – particularly in relation to data science.
With between 60% and 90% of the effort of most big data project allocated to data engineering tasks, the role has matured as organisations found that traditional data scientists either did not have the skills, or do not have the aptitude to focus on data preparation
The data engineers focus on delivering a quality data pipeline, leaving the data scientist free to deliver analytics. For example, they may be responsible for building data pipelines to collect and store data; building extract, transform and load (ETL) processes to prepare data for analysis; designing data quality processes to cleanse or enrich data; and engineering services and frameworks to deliver the data need for analytics.
This means that data engineers focus on programming, systems and data management skills rather than quantitative skills.
The right tools can really help the data engineer to deliver more quickly and more cost effectively – a topic we will cover at next week’s DAMA Johannesburg Chapter meeting – How does big data change data integration
Hope to see you there!