Identifying Personal Identifying Information across multiple databases using unified data lineage

Modern data architectures are characterised by the movement of data between systems – both internally, and increasingly, to external systems in the Cloud. A key requirement for PoPIA compliance is to identify the various locations where sensitive or unmanaged (noncompliant) data is being stored. A unified data lineage platform can help to address key pillars…


With less than 12 months to achieve PoPIA compliance, many organisations are in a race against time to identify and govern personal data across the organisation.

A few weeks back, we spoke about the importance of a top down-down approach to PoPIA compliance. In effect, we suggested that we learn from the EU implementation of GDPR to focus on high-priority / high-risk processes, rather than trying to classify all data attributes.

Subsequently, I touched on how to find personal data in your ERP or CRM system using business-friendly metadata.

This week I will look at solving another of the biggest challenges identified during the GDPR compliance journey in Europe:

Tracking the movement of personal data across multiple databases

Modern data architectures are characterised by the movement of data between systems – both internally, and increasingly, to external systems in the Cloud. A key requirement for PoPIA compliance is to identify the various locations where sensitive or unmanaged (noncompliant) data is being stored.

A unified data lineage platform can help to address key pillars of PoPIA and allow you to scale your top-down approach rapidly.

  1. Know your customer data: PoPIA requires that you know where customer data is stored and also how, and why, it has been shared with other systems, both internally and externally. Using the unified data lineage platform from MANTA, you can parse both SQL and ETL code to provide both a technical and business data lineage map that can be easily interpreted by any stakeholder interested in data privacy.
  2. Extend your data classification efforts: A top-down approach implies that not all personal data will be classified immediately. However, once critical PII data fields have been classified data lineage will identify related elements (based on lineage) that can be immediately classified. For example, ‘Customer Name’ identified in our ERP can be mapped using lineage to the CRM, to the EDW, to the data lake, etc. and each of these attributes can also be tagged, and, managed according to your PoPIA polices, which may require you to anonymise or encrypt the data, restrict access, or take other steps to ensure compliance.
  3. Help with data portability: PoPIA allows anyone to request a copy of the data you store about them. Unified data lineage makes it much easier to understand how personal data is moved through your enterprise
Identifying PII in MANTA using Active Tags

The complexity of today’s data landscapes makes it almost impossible to discover this lineage manually.

Using MANTA to automatically analyse all database objects and data processing logic allows you to very quickly identify how personal data flows through your business, and take appropriate steps to protect it.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.



Related posts

Discover more from Data Quality Matters

Subscribe now to keep reading and get our new posts in your email.

Continue reading