Managing the modern data pipeline

A few weeks back I wrote about the emerging role of the data engineer – the group of person’s responsible for delivering the quality data pipelines that enable the data scientist. I followed it up with this tweet – which I believe summaries very consisely the changing reality of big data and advanced analytics 2012…

Data Governance versus Data Quality

Data quality and data governance both strive to optimise data and information to meet business needs. Simplistically, however, where Data Governance deals with the definition of, and responsibility for, data management standards, Data Quality deals with the practical implementation, monitoring and enforcement of these data management standards for individual platforms and systems. Both data governance…

true cost of ELT

Big data use case: Offloading the data warehouse to Hadoop

The true cost of ELT Today’s business world is demanding more from the data warehouse, because more than ever an organisation’s survival depends on its ability to transform data into actionable insights. However, ELT data integration workloads are now consuming up to 80% of database capacity, resulting in: Rising infrastructure costs Increasing batch windows Longer development cycles Slower…

dmx gui

What is data integration? (building a single view of the truth)

Data integration defined Data integration is a common industry term referring to the requirement to combine data from multiple separate business systems into a single unified view, often called a single view of the truth. This unified view is typically stored in a central data repository known as a data warehouse. For example, customer data integration involves…

build your enterprise data hub

Cloudera and Hortonworks merge – good news for customers

Late last year, Cloudera and Hortonworks announced plans to merge the two companies – a move that came into play early this year. According to Tendü Yoğurtçu, CTO of our partner, Syncsort, the two companies have emerged as clear winners in the data space and gained momentum. “But each has had its own unique strengths,…

ETL ELT architecture

What is ETL?

ETL defined Extract, Transform and Load  or ETL is a standard information management term used to describe a process for the movement and transformation of data. ETL is commonly used to populate data warehouses and datamarts, and for data migration, data integration and business intelligence initiatives. ETL processes can be built by manually writing custom scripts or code, with…

Turning “No” into “Yes” through data governance

Data governance has become associated with the word “No.” This is the intriguing start to a recent IDC perspective discussing how Cox Automotive bucked the trend and used data governance to become the “Yes” team – delivering the right data to the right person at the right time. The report, which can be accessed here,…