Data Scientist’s Guide to Preserving Privacy

  • Alfred Rossi Research Scientist
  • Stephen Bailey Director, Data & Analytics

As personal data becomes abundant, the risk of sensitive data being leaked or misappropriated has become much greater. This risk is greatly exacerbated by the ability to augment publicly available data. This occurs, in part, because aggregation erodes privacy—the combination of disparate and seemingly trivial bits of personal information can be used to infer sensitive personal attributes. Consequently, organizations seeking to maintain trust with their customers must have robust frameworks in place to preserve privacy within their curated data and when those sources are joined with external data.

In this webinar, we present practical approaches to maintain privacy and highlight the vulnerabilities of each approach within the analytics workflow. We discuss, in detail, three common techniques for data privatization: masking, k-anonymization, and differential privacy. For each technique, we ground the discussion in a case study that highlights the trade-offs between data utility, privacy preservation, and robustness against linkage attacks.

You will walk away with:

  • a framework for identifying privacy risks in their own analyses
  • multiple approaches that can be used to preserve privacy
  • an understanding of how to make decisions that balance utility and privacy