When researching data de-identification, you’ll come across terms such as anonymization, pseudonymization, and generalization.
How do we de-identify data?
“Data anonymization is a type of information sanitization whose intent is privacy protection. It is the process of removing personally identifiable information from data sets, so that the people whom the data describe remain anonymous.” – Wikipedia
“Pseudonymization is a data management and de-identification procedure by which personally identifiable information fields within a data record are replaced by one or more artificial identifiers, or pseudonyms. A single pseudonym for each replaced field or collection of replaced fields makes the data record less identifiable while remaining suitable for data analysis and data processing.” – Wikipedia
“A generalization is a form of abstraction whereby common properties of specific instances are formulated as general concepts or claims. Generalizations posit the existence of a domain or set of elements, as well as one or more common characteristics shared by those elements (thus creating a conceptual model). As such, they are the essential basis of all valid deductive inferences (particularly in logic, mathematics and science), where the process of verification is necessary to determine whether a generalization holds true for any given situation.” – Wikipedia
The challenge with each of these terms is that real-world implementation, particularly when applied to data privacy, is open to interpretation. Data privacy is context-sensitive – my doctor may need complete access to my medical records, for example, while his assistant may only need access to my payment records.
This new whitepaper, from our partner Okera, offers a practical guide to data anonymisation when applied to protecting confidential, personally identifiable, and regulated data from being accessed inappropriately, so that analysts and data scientists can use the data responsibly.
Who will this help?
The considerations and techniques presented here help data teams who are responsible for:
• Provisioning data to the business for analytics purposes
• Protecting data from unlawful or other unauthorized access
• Complying with ethical use and data privacy regulations
• Respecting customer and partner data confidentiality agreements
The whitepaper provides strategies to de-identify different kinds of data, by considering both the type of data, and the characteristics of underlying data sets.
You can download it free from here