A practical guide to data anonymization


When researching data de-identification, you’ll come across terms such as anonymization, pseudonymization, and generalization.

How do we de-identify data?

Data anonymization is a type of information sanitization whose intent is privacy protection. It is the process of removing personally identifiable information from data sets, so that the people whom the data describe remain anonymous.” – Wikipedia

Pseudonymization is a data management and de-identification procedure by which personally identifiable information fields within a data record are replaced by one or more artificial identifiers, or pseudonyms.[1] A single pseudonym for each replaced field or collection of replaced fields makes the data record less identifiable while remaining suitable for data analysis and data processing.” – Wikipedia

“A generalization is a form of abstraction whereby common properties of specific instances are formulated as general concepts or claims.[1] Generalizations posit the existence of a domain or set of elements, as well as one or more common characteristics shared by those elements (thus creating a conceptual model). As such, they are the essential basis of all valid deductive inferences (particularly in logic, mathematics and science), where the process of verification is necessary to determine whether a generalization holds true for any given situation.” – Wikipedia

The challenge with each of these terms is that real-world implementation, particularly when applied to data privacy, is open to interpretation. Data privacy is context-sensitive – my doctor may need complete access to my medical records, for example, while his assistant may only need access to my payment records.

A practical guide to Data Anonymization

This new whitepaper, from our partner Okera, offers a practical guide to data anonymisation when applied to protecting confidential, personally identifiable, and regulated data from being accessed inappropriately, so that analysts and data scientists can use the data responsibly.

Who will this help?

The considerations and techniques presented here help data teams who are responsible for:
• Provisioning data to the business for analytics purposes
• Protecting data from unlawful or other unauthorized access
• Complying with ethical use and data privacy regulations
• Respecting customer and partner data confidentiality agreements

The whitepaper provides strategies to de-identify different kinds of data, by considering both the type of data, and the characteristics of underlying data sets.

You can download it free from here

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.