What Are the Most Common Types of Data Masking?

What Are the Most Common Types of Data Masking?

Data masking is a type of data access control that changes sensitive information in a data set by replacing it with fake but convincing versions of the original data. Given the widespread necessity for sensitive data protection, data masking must be adaptable to any data environment. Regardless of the size, purpose, or tools in your data stack, there is a type of data masking that fits your use case. Each of these types of masking possesses its own strengths, and by examining each you can decide which is best suited for your organization’s needs.

How you apply data masking has taken two major forms over time, as well as some secondary forms that are not applied as frequently. Each form is best suited for different scenarios or data environments, and by examining the strengths and weaknesses of these types of data masking you can evaluate which should best suit your organization. Here are some of the most common types of data masking:

Static Data Masking (SDM)

The first of the major types of data masking, Static data masking (SDM) masks data at rest rather than in active use. By creating a copy of an existing data set and scrubbing it of all sensitive and/or personally identifiable information (PII), this data can then be stored, shared, and accessed for use without putting sensitive information at risk.

The most important aspect of SDM is that it makes a copy of existing data. This means that the masked output of the SDM process is detached from the initial data, with no connections tying the two together. This static characteristic of the data means it will not see updates unless you create another new, more current copy.

When to Use Static Data Masking

Static data masking is best suited for environments where data is only used for a single purpose and does not change over time. Software and application development or training are examples of environments when SDM is useful. When creating a new tool or application, developers need to test their software with data that is realistic enough to be treated in the same way real
data would be.

Since static data masking scrubs real data sets of all sensitive information, it strikes the balance between utility and safety in a testing environment. Developers can run tests that respond in a realistic fashion without having to worry that the data could be exposed or used for the wrong reasons. However, when large data sizes and/or combinations of different levels of access are introduced, this approach becomes greatly hindered by an inability to scale with ease. Because of this, static data masking is not recommended for analytical use cases because “live” data is required, where updates are real-time available and not hindered by stagnant data.

Dynamic Data Masking (DDM)

The second major type of data masking, dynamic data masking (DDM) does not move or copy data. Instead, it takes the more agile approach of applying masking techniques at query-time. DDM applies the same types of data masking techniques as SDM, but does so without needing to separate the data from its original source.

This maintains a single source of truth for the data set, rather than making multiple copies of scrubbed and masked data for various uses. DDM helps teams avoid the pitfalls of confusion and data silos that arise from creating many unnecessary copies of the data. Most importantly, since it is never a copy, the data remains “live” and updated, which is critical for analytical use cases.

When to Use Dynamic Data Masking

Of the types of data masking, dynamic data masking may be the most widely-applicable. Since this type of masking is actively enforced at query time, it is not limited based on where the data is stored or copied to – it is a “live” view of the data. It also allows for more complex logic across varying sets of users, since masking is applied at runtime rather than requiring the creation of a copy for every scenario like with SDM. Because of this, DDM supports more complex policy scenarios and use cases, including dynamically retaining or destroying referential integrity.

Since dynamic masking maintains a single source of truth for data sets and does not require copying data, this “live” view is perfect for analytical use cases. Compliance is also much easier to manage because many more complex policy scenarios can be handled without making hundreds of copies of your data. What’s more, instead of manually maintaining and auditing numerous copies of a data set, policy is enforced and monitored/audited in a single consistent location.

Deterministic Data Masking

A less common type of masking, deterministic data masking is a very straightforward approach that simply takes certain values in a data set and replaces them with a different, predetermined value. For example, all appearances of the name “Marie” in a demographic data set could be set to change to the name “Maxine.” This change would be applied to every appearance of “Marie” in the data set.

The simplicity of this model is both its greatest asset and weakness. While easy to apply, the “code” is simple enough to potentially fall victim to reverse-engineering, making it a potential target for attackers and increasing the risk of a data leak.

When to Use Deterministic Data Masking

This form of masking is often praised for its consistency. Since values are masked consistently across data stacks, users can expect their data to be masked and protected whenever they require access. It is also celebrated for its simplicity, but we’ve noted that this can also be a weakness.

Deterministic data masking may be helpful to use in limited cases where data is not at an increased risk of breach. Regardless of the level of risk, however, it will still remain easier to reverse-engineer than more dynamic approaches to masking. Even though deterministic masking can be applied simply and logically, it alone may not be the best choice for protecting your sensitive data.

Real-Time (“On-the-Fly”) Data Masking

The second less common type of masking, real-time data masking takes a very conservative approach to protecting data. Instead of masking entire data sets while accessing or copying them like DDM and SDM, this method pulls and masks data ad hoc as it is requested by applications or users. It is applied when data is sent between production and testing or development environments, and requires much less storage than methods like SDM that require whole copies to be made.

This method is inherently more secure than deterministic and static data masking, as data is masked case-by-case to ensure protection and compliance.

When to Use Real-Time Data Masking

Given its ad hoc nature, real-time data masking has a similar flexibility to dynamic data masking. As implied by the name, it is done “on-the-fly,” which means it does not rely on systems to be set up and applied passively. This could be operational for a smaller data ecosystem that facilitates a variety of data use cases.

However, the drawback of real-time data masking’s ad hoc nature is that it is inherently less scalable than a dynamic form. Masking data on a case-by-case basis might work on a small scale, but as an organization grows and evolves its data stack, number of data sources, and number of data users, it becomes increasingly difficult and time-consuming to mask on-the-fly.

Which is the Best Type of Data Masking?

Given what we know about each of these types of data masking, each has their own individual merits. There are certain use cases in which static, deterministic, or real-time data masking might be the most effective technique for an organization to apply to its sensitive data.

However, there is only one type that has the scalability and flexibility to adapt to an organization’s growth and changes. Dynamic data masking can support data masking techniques ranging from k-anonymization to differential privacy and beyond, putting many effective masking tools at an organization’s disposal. These techniques, when enforced automatically at query time across the entirety of a data environment, ensure that sensitive data is protected against leaks or breaches at all times. And all this is done while maintaining a single source of truth for data, and without negatively affecting time-to-access or efficiency.

Immuta’s Data Security Platform provides organizations with a holistic, consistent, and scalable model that automatically applies dynamic data masking to protect sensitive data. With room to adapt, our platform ensures that your data can be masked dynamically now and into the future.

To see how creating a policy in Immuta enforces data masking at scale, try our new hands-on walkthrough demo.

Data Masking 101: A Comprehensive Guide

Download the Guide
Blog

Related stories