Are you struggling to really track lineage?

Understand the importance of data lineage for business users seeking reliable insights. Discover how business traceability diagrams enhance understanding and accuracy. Learn how automation and tools like MANTA can provide a complete, accurate view of data movement and changes within your organization


Data lineage is generally defined as a kind of data life cycle that tracks the data’s origins and where it moves over time, as well as changes (such as aggregations or transformations) that may happen to it as it moves.

tracking data lineage - why and how

Historically, this meant the data lineage has become a term associated with ETL – a process for the movement and transformation of data. ETL developers use data lineage embedded into their ETL tool of choice in order to visually represent the ETL flows, identify issues and understand the impact of changes.

From Data Lineage to Business Traceability

Initially driven by compliance requirements, such as BCBS 239, the business has become increasingly aware of the critical importance of lineage to understanding the source of data and delivering well-understood and trusted reports.

However, the technically oriented views of lineage often do not support business users who require a simplified understanding that allows them to understand at a glance that data is sourced from the right place and is an accurate representation of the source.

Pioneered by business-oriented vendors, such as our partner, Precisely, data catalogues are designed to add business context to data lineage – allowing users to answer questions such as “Where does my data come from? What policies were used? What standards are applied?”

The ability to present both technical lineage and business traceability diagrams is critical to understanding data and using it effectively.

ETL provides an incomplete picture

An unspoken truth, however, is that ETL typically provides an incomplete picture.

  1. As described above, business traceability adds business context to ETL processes. This context may also include manually captured steps in an ETL process – for example, there may be cases where humans are required to intervene and edit data or add data points from the source to meet specific criteria. These manual edits cannot be detected by an ETL process and must be manually added to complete a lineage view
  2. In many environments, formal ETL processes may call stored procedures, or other database-level code, in order to perform transformations or aggregations. This code is not typically included in a lineage view
  3. Finally, many organisations sit with multiple ETL tools for different purposes – maybe using one of the big vendors for their data warehouse, a specialist like Precisely Connect for Hadoop, and combinations of SQL code or Microsoft SSIS for departmental work.

How do we bring this all together?

Automation is key

A typically large organisation may have tens of thousands of ETL and SQL scripts and stored procedures that must be documented and maintained in order

In a case study titled The only reason developers exist our partner, MANTA, discussed the challenges involved in trying to complex landscape keep this up to date at a Fortune 500 bank.

The customer faced three common challenges:

  1. The inevitability of change means that ETL processes are in a constant state of flux
  2. The pain of documentation means that developers do not accurately document their processes resulting in
  3. Make-believe lineage – a documented view of the ETL world that does not reflect reality

Using MANTA allows companies to automatically update lineage across a range of ETL tools, data sources and reporting platforms – providing a consolidated an accurate view of your lineage and, if desired, pushing it to tools like Collibra to add the business context and business traceability.

Companies that need to understand the source of their data and get a true picture of how it moves and changes through their organisation should take a look.

Responses to “Are you struggling to really track lineage?”

  1. Are you struggling to really track lineage? | GeekMustHave

    […] via Are you struggling to really track lineage? — Data Quality Matters […]

  2. Avni Kasikci

    This was llovely to read

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.



Related posts

Discover more from Data Quality Matters

Subscribe now to keep reading and get our new posts in your email.

Continue reading