Article

How to Enhance Privacy in Data Science

The new frontier of innovation calls for vast amounts of personal information to be digested into analytical products such as reports, data sets, and machine learning models. The more data, the greater the risk of data leakage, misappropriation, and loss of customer trust. In this data-rich world, how should a modern organization proactively and sustainably develop controls and processes that guard personal data? 

To help answer this, Immuta’s Research Team has developed an eBook that digs into some of the more prominent privacy-enhancing data anonymization techniques. This in-depth book provides you with an overview of the challenges and opportunities of privacy-aware analytics, and actionable tools to help data analysts and data scientists implement anonymization techniques within their data projects. We also discuss the following techniques, including the circumstances the protections afforded by each can be compromised.  

  • De-identification: A process of replacing individual identifiers and, more generally, sensitive attributes with less meaningful, non-sensitive, placeholder values. 
  • 𝓀-Anonymization: A constraint on a data-set that ensures that no individual can be singled out from 𝓀-1 others given knowledge of quasi-identifying attributes such as zip-code, birth date, or biological sex. We also discuss two refinements of 𝓀-anonymization, l-diversity, and t-closeness.
  • Differential Privacy: An advanced family of techniques that mathematically limit the ability of an outsider to make confident inferences about analysis input from analysis output. Analytical products produced via differential privacy enable participating individuals to credibly deny their participation in the input.
  • Local Differential Privacy: An advanced family of techniques enabling participating individuals to credibly deny the contents of their records.

To download the eBook, visit: https://www.immuta.com/privacy-in-data-science/