Blog
Data Owners: Privacy is YOUR Problem

Data Owners: Privacy is YOUR Problem

STEPHEN BAILEY SHARE ON SOCIAL MEDIA

Published June 23, 2021

Last edited: November 7, 2025

Share this article

Data has been called “the new gold” for its ability to transform and automate business processes; it has also been called “the new uranium” for its ability to violate the human right to privacy on a massive scale. And just as nuclear engineers could effortlessly enumerate fundamental differences in gold and uranium, so too must data engineers learn to instinctively identify and separate dangerous data from the benign.

Take, for instance, the famous “link attack” that re-identified the medical records of several high-profile patients of Massachusetts General Hospital. In 1997, MGH released about 15,000 records in which names and patient IDs had been stripped from the database. Despite the precautions, Harvard researcher Latanya Sweeney was able to connect publicly available voter information to these anonymized medical records by joining them on three indirect identifiers: zip codes, birthdays, and genders. This left Sweeney, with only a handful of records to sift through, to re-identify many individuals — most notably, the Massachusetts governor’s patient records.

More than twenty years later, every business is an MGH and every person with internet access is a potential Latanya Sweeney. Yet, we all want a world where data is handled responsibly, shared cautiously, and leveraged only for the right purposes. Our greatest limitation in realizing that world is not one of possibility but responsibility; it’s not a question of “How?” but “Who?”.

I believe data engineers must be the ones to take ownership of the problem and lead. Controlling the re-identifiability of records in a single dashboard is good analytics hygiene, but preserving privacy in the platform delivering the data is crucial. Managing privacy loss is a systemic problem demanding systemic solutions — and data engineers build the systems.

The mandate to protect privacy does not translate to a boring exercise in implementing business logic; it presents exciting new technical challenges. How can we quantify the degree of privacy protection we are providing? How can we rebuild data products — and guarantee they still function — after an individual requests that their data be deleted? How can we translate sprawling legal regulations into comprehensible data policies while satisfying data-hungry consumers?

We will need to formulate a new set of engineering best practices that extends beyond the familiar domains of security and system design. Determining what is best practice requires much practice, though. It is essential that engineering leaders push their teams to understand and address the pertinent issues: the strengths and weaknesses of data masking, techniques like k-anonymity and differential privacy, and emerging technologies such as federated learning. Ultimately, data engineers should know the practice of privacy by design as intuitively as they do the principle of least privilege.

The alternative, if history is any guide, is a world in which institutions publish “anonymized” data to the world, and clever people and organizations reconstruct and repurpose private data for their own ends. Managing privacy, far from being an abstract concept for just philosophers and lawyers, has become a concrete problem perfectly suited for data engineers. It’s time they made it their own.

To read more about this topic and other essential tips for data engineers, check out
97 Things Every Data Engineer Should Know, available here.

Stephen Bailey is Director of Data & Analytics at Immuta, where he strives to implement privacy best practices while delivering business value from data. He loves to teach and learn, on just about any subject. He holds a PhD in educational cognitive neuroscience from Vanderbilt and enjoys reading philosophy.

Immuta’s Data Team Shares 8 Great Resources for Data Engineers

Many people fall into data engineering by accident. Software engineers may find that they enjoy building platforms to drive their company’s data initiatives; data scientists may find they need to get “dirty” to deliver insights at scale. What they have in common is that there’s always something new to learn...

3 Best Practices for Maximizing Data Management Efficiency

In 2020, global spending on cloud data services reached $312 billion. In 2022, Gartner estimates that this number will rise to a staggering $482 billion. This immense increase proves that the migration to and adoption of cloud platforms is the bona fide standard for contemporary information services and analysis. With...

A Guide to Enabling Inter-Domain Data Sharing

For many, the appeal of a decentralized data architecture relates to its potential for enhanced collaboration. But to achieve this kind of streamlined collaboration, your team must first establish a system of secure, self-service domains. In a previous blog, we explored how to make decentralized data mesh architectures a reality based on phData’s...

your data

Put all your data to work. Safely.

Innovate faster in every area of your business with workflow-driven solutions for data access governance and data marketplaces.

Book a demo

Platform

Govern

Provision

Comply

Agentic Data Access

Integrations

All Integrations

Snowflake

Databricks

Teradata

Starburst

Resources

Blog

Resource Center

Documentation

Support

Live Learning

Webinars

In-Person Events

Book a Session with Us

Company

Company

Careers

Newsroom

Connect

Events

Contact Us

Data Owners: Privacy is YOUR Problem

Share this article

Immuta’s Data Team Shares 8 Great Resources for Data Engineers

3 Best Practices for Maximizing Data Management Efficiency

A Guide to Enabling Inter-Domain Data Sharing

Put all your data to work. Safely.

Platform

Govern

Provision

Comply

Agentic Data Access

Integrations

All Integrations

Snowflake

Databricks

Teradata

Starburst

Resources

Blog

Resource Center

Documentation

Support

Live Learning

Webinars

In-Person Events

Book a Session with Us

Company

Company

Careers

Newsroom

Connect

Events

Contact Us

Get Our Newsletter

Data Owners: Privacy is YOUR Problem

Share this article

Immuta’s Data Team Shares 8 Great Resources for Data Engineers

3 Best Practices for Maximizing Data Management Efficiency

A Guide to Enabling Inter-Domain Data Sharing

Put all your data to work. Safely.