Managing the modern data pipeline

A few weeks back I wrote about the emerging role of the data engineer – the group of person’s responsible for delivering the quality data pipelines that enable the data scientist. I followed it up with this tweet – which I believe summaries very consisely the changing reality of big data and advanced analytics 2012…

ETL ELT architecture

What is ETL?

ETL defined Extract, Transform and Load  or ETL is a standard information management term used to describe a process for the movement and transformation of data. ETL is commonly used to populate data warehouses and datamarts, and for data migration, data integration and business intelligence initiatives. ETL processes can be built by manually writing custom scripts or code, with…

Four steps to transforming data

Another brief post this week on an area that we do not focus on very often: data transformation. Data transformation is a relatively mundane yet fundamental data management capability – particularly when dealing with similar data from multiple sources. Three simple examples: System A represents Male and Female and 0 and 1, while System B…