Parsing Data Security from Privacy in the GDPR

The European Union’s General Data Protection Regulation — one of the most forward-leaning privacy regulations on the planet — was praised by Tim Cook in a recent speech in the EU because our personal data is “being weaponized against us with military efficiency.”

Those are strong words, and frankly, accurate: the GDPR is an attempt through regulation to de-weaponize our personal data. The regulation comprises 88 pages, 99 Articles and 173 Recitals and will be enforced by state-level data protection authorities within the EU, which means interpretation will likely vary by country until the Court of Justice of the European Union decides on the matter.

It’s long, sometimes ambiguous and covers a good deal of technical “territory,” such as model explainability, pseudonymization and security, to name a few. The GDPR can be very complex for an organization to digest and then plan an approach that will eventually live in their technical stack.

A good way to think about GDPR compliance is by asking, “how do you maximize the value of your data assets in a controlled fashion?” This requires a risk-based assessment of how you’re using your data, and more importantly, should be using your data in a way your data subjects expect. One way to start digesting risk is to separate the concepts of cybersecurity — well understood and mature tech — from Privacy Enforcing Technology (PET), which is less understood among technical personnel. While the GDPR does have articles relevant to security, by and large, the regulation is focused privacy preservation.

So, what’s the difference between privacy and security?

Security can be thought of as authentication, firewalls and encryption, to include homomorphic encryption. It can feel black and white: you are either allowed in or you aren’t.

Privacy isn’t black and white and finding a middle ground is key. This is commonly referred to as the “privacy vs. utility trade-off.” On the one hand is total privacy, which means no one can use the associated data; on the other hand, is full utility, which means everyone who wants to can use the data for any purpose.

Good privacy is a balancing act between these extremes. Privacy techniques enforced within companies provides assurances about your data use or misuse that can be made without relying on blind trust in those companies or their employees. The GDPR terms this “data protection by design” — and all organizations need to be doing it.

Think of data privacy controls as a knob that can be tuned: on the left is pure randomness (no utility), and on the right is complete utility (no privacy measures enforced at all). All too frequently, organizations have this gear tuned all the way to the right.

Differential privacy, for example, is an anonymization technique that adds noise to query responses in a way that provides privacy guarantees for the data subjects contained in the source of data. Differential Privacy would be skewed to the left of that knob, but still provide a good enough mix of privacy and utility for meaningful analysis.

Data masking techniques such as tokenization and k-anonymization result in less privacy but provide more utility, or what GDPR terms pseudonymization. Your gear should be tweaked based on who you are in the organization and what you’re trying to do with the data – privacy controls should not and cannot ever be black and white.

But there’s another catch: Purpose

PETs are only one part of the privacy story. The cornerstone of the GDPR is purpose, and you must wrap your PET with the context of why you are processing the data, understand purpose in every interaction with your data and each purpose must be determined beforehand and limited in scope.

Some organizations break up their use cases by level of anonymization needed with appropriate purpose. If they’re doing a statistical analysis – for example, what radio stations are most popular in which areas – this is a great use case for a technique like differential privacy. In cases where they need to break privacy for specific discovered outliers – for example, what patients are at risk of heart disease – then pseudonymization is appropriate because it’s wrapped in a stated purpose that should not be ignored (e.g. you cannot de-identify an individual for any other reason than heart disease).

Certainly, there’s overlap between security and privacy, and without security, there is no privacy. As you build solutions into your GDPR technology stack, you must ensure you keep the concepts and goals of security and privacy both in sight. If you only think about security, you protect yourself from a shorter list of threats (unauthorized access, unwanted modification of data and loss of data).

Each data protection principle is linked to its own threats, so when you include privacy controls, you protect from a broader range of threats (unnecessary profiling, obsolescence of personal data, unfair or opaque processing of personal data and the inability to generate compliance narrative or to exercise data subject rights). Coverage across the security and privacy threats is therefore the only way to remain GDPR compliant.

To learn more about privacy enhancing technologies and how to implement them, read our eBook, How to Enhance Privacy in Data Science.