Webinar: Join Immuta, HP, & NightDragon to learn why managing access is critical for the future of data use.

Sign Up

How to Mask PII Data: A Guide to Data Masking Techniques

According to the Identity Theft Resource Center, there were 1,862 data breaches in 2021, exceeding the previous record of 1,506 in 2017. Of these breaches, 83% contained sensitive information that became available to the attackers.

The exposure of large swaths of raw data, especially when sensitive, can have dangerous consequences. In today’s data-driven world, the implicit expectation that organizations make safe and responsible use of the data they collect is a large part of consumer trust and public image. No company wants to be at the center of a breach scandal that leaks the information of millions of its customers.

To avoid violating this trust and putting customers in harm’s way, it is imperative that organizations take steps to protect any personally identifiable information (PII) that they are generating or collecting. This is where dynamic data masking techniques can help. These masking techniques preemptively alter this information, making it much more difficult to access or comprehend in the event of an attack. Let’s dive deeper into the most common data masking techniques to see which may be best for your organization.

What is Personally Identifiable Information (PII)?

You may be familiar with the story of Hansel and Gretel, siblings who left behind trails of stones and breadcrumbs in order to find their way out of the forest and back home.

In the fairytale, the characters purposefully left a trail of information to lead them to safety. In modern practice, however, individuals don’t always have the choice of which information they’re leaving behind – or how it’s being handled.

For any digital citizen, the daily use of technology leaves behind a trail of PII.  By definition, PII is any information that can, either directly or indirectly, reasonably confirm the identity of the individual to whom it applies. Examples of PII include full names, email addresses, geographic locations, bank account numbers, drivers license numbers, or any other information that can be connected directly back to an individual.

Many of the online forms and apps you use each day request your PII, from streaming service subscriptions that require your email and credit card number, to GPS-based apps that have access to your location. This information is produced, collected, and used constantly. And it can be very detrimental if this data falls into the wrong hands.

Why Does PII Need to be Protected?

So why is it a business imperative to choose a proper data masking technique and protect the PII your organization collects?

Many modern data laws and regulations require PII to be secured using verified techniques and processes. HIPAA, CCPA, GDPR, and virtually every other data privacy measure has language around the need to protect this information.

Beyond legal requirements, it’s important to consider the general danger of not protecting PII. In 2013, it was discovered that the Android app “Brightest Flashlight Free” had been sharing precise location and device information with third parties. Not only did this app deceive users who did not deliberately choose to share this information, but a formal complaint to the FTC required a settlement penalizing the app creators.

The PII shared by the app, if analyzed freely and unmasked, could show the exact locations of users who thought they were simply using a flashlight. This direct identifier, meaning information relating specifically to the individual, was visible to anyone with access to that data. Indirect identifiers like age, salary, date of birth, etc., can be true for–but not unique to–the individual in question. An aggregation of either or both of these types of PII from different data sets can still provide the necessary context for someone to determine the identity of the user it applies to.

Types of Data Masking

Data masking is a subset of data access control that takes existing data and creates a fake (but convincing) alternate version of it. This is done by changing the sensitive values in a data set. The new “masked” versions of the data are difficult to reverse-engineer and re-identify data subjects in the event of a breach.

Data masking techniques can take a few distinct forms:

Static Data Masking

Static data masking creates a copy of the data set you’re looking to mask and removes all PII. The copied version is then able to be shared without risking PII exposure. While this version is secure, it may cause confusion because it eliminates a single source of truth for the data set.

Dynamic Data Masking

Rather than requiring data to be copied or moved, dynamic data masking simply applies data masking techniques while data is being shifted between parts of the testing/development environment. This means that no copying is required, and data can effectively be protected while still maintaining a single source of truth.

Other relevant data masking techniques include deterministic and real-time masking, which mask data based on specific values or in small batches, respectively. You can read more about these techniques and explore top data masking tools in this article.

Popular Data Masking Techniques

Whether done in a static, dynamic, deterministic, or real-time manner, there are a wide range of data masking techniques that can be used to effectively protect PII in your data environment. These techniques include:

k-Anonymization

One of the most secure masking techniques, k-anonymization is often likened to “hiding in the crowd.” This method combines data sets with similar attributes in order to make individual identification virtually impossible. Since the combined data could refer to any member of the data set, no individual can be identified.

Encryption

Another extremely effective masking technique, encryption scrambles data values and requires a specific decryption key to unscramble. This technique is very secure, but can also require advanced technologies to carry out.

Differential Privacy

Differential privacy is applied by injecting randomized “noise” into any data analysis environment. This creates an environment where the original information is still accessible and available for analysis, but an unauthorized viewer would not be able to identify data subjects at an individual level.

Nulling

One of the more simple data masking techniques, nulling plainly removes values from a data set based on a viewer’s authorization. For instance, identifiable information like an address would be listed as “null” for any viewer without the proper credentials or permissions.

Redaction

Similar to nulling, data redaction removes or substitutes certain values in the data set based on permissions. For reference, think of declassified government documents with entire sections blacked out.

Pseudonymization

Brought to relevance by the General Data Protection Regulation (GDPR), pseudonymization is more an aggregate of data masking techniques than its own unique method. It is any combination of techniques that removes direct identifiers or indirect identifiers from a data set.

Averaging

This technique removes specific values from data sets and replaces them with average values. An example could be taking specific ages listed in a demographic data set and replacing them with age ranges.

Substitution

Just like it sounds, this method substitutes values in a data set for other realistic-looking values that do not impact the data’s meaning or utility. This helps maintain the applicability of the set without revealing too much PII.

Tokenization

This method replaces PII in a data set with a “token” that has no extrinsic meaning or value. A key that reveals the meaning of the token is separated from the data set by firewalls, so only those granted access to both will be able to decipher and utilize the data.

Choosing a Data Masking Technique That’s Right for You

The protection of individuals’ PII is not only desirable, but indeed vital to the operation of a modern organization. Taking action to preemptively protect data from possible dangerous breaches keeps organizations safe, ethical, and trustworthy.

With so many data masking methods available, it’s important to examine the options and choose whichever are best suited for your organization. You should seek an option that is scalable, universally compatible across cloud platforms, compliant, and auditable to ensure that it’s working effectively.

If you’re looking for an automated data access platform that employs dynamic data masking, k-anonymization, and other PETs to help protect PII, look no further than Immuta. Our platform can protect your most sensitive data without slowing time-to-access or efficiency.

To take a self-guided look at how protective policies are created and implemented with Immuta, try our walkthrough demo today.

Ready to get started?

Write a policy
Blog

Related stories