What Are the Most Common Types of Data Masking?
Data masking is a type of data access control that changes sensitive information in a data set by replacing it with fake but convincing versions of the original data. Given the widespread necessity for sensitive data protection, data masking must be adaptable to any data environment. Regardless of the size, purpose, or tools in your data stack, there is a type of data masking that fits your use case. Each of these types of masking possesses its own strengths, and by examining each you can decide which is best suited for your organization’s needs.
Here are some of the most common types of data masking:
Static Data Masking (SDM)
Static data masking (SDM) masks data at rest rather than in active use. This is done by creating a copy of an existing data set and subsequently scrubbing it of all sensitive and/or personally identifiable information (PII) through a range of masking techniques. Once this new set of scrubbed data is created and masked, it can be shared without the risk of sensitive data being leaked or accessed by the wrong people.
The most important aspect of SDM is that it makes a copy of existing data. This means that the masked output of the SDM process is detached from the initial data, with no connections tying the two together.
When to Use Static Data Masking
Static data masking is best suited for software and application development or training environments. When creating a new tool or application, developers need to test their software with data that is realistic enough to be treated in the same way real data would be. However, they generally can’t use true data sets without risking sensitive data leakage.
Since static data masking scrubs real data sets of all sensitive information, it strikes the balance between utility and safety in a testing environment. Developers can run tests that respond in a realistic fashion without having to worry that the data could be exposed or used for the wrong reasons. In an evolving data stack, however, this utility becomes greatly hindered by an inability to scale with ease.
Dynamic Data Masking (DDM)
Dynamic data masking (DDM) does not move or copy data. Instead, it takes the more agile approach of applying masking techniques as data actively moves between parts of the testing/development/production environment. DDM applies the same types of data masking techniques as SDM, but it occurs without needing to separate the data from its original source.
This maintains a single source of truth for the data set, rather than making multiple copies of scrubbed and masked data for various uses. DDM helps teams avoid the pitfalls of confusion and data silos that arise from creating many unnecessary copies of the data.
When to Use Dynamic Data Masking
Of the types of data masking, dynamic data masking may be the most widely-applicable. Since this type of masking is applied actively as data is streamed between parts of the data stack, it is not limited based on where the data is stored or copied to. It is applied at query time, and therefore can actively determine (through data access control measures) what should be masked for which users.
Since dynamic masking does not require copying data and maintains a single source of truth for data sets, compliance is much easier to manage. Instead of needing to maintain and audit numerous copies of a data set, streamlined data policy enforcement is able to automatically apply the necessary masking measures on any and all queries.
Deterministic Data Masking
Deterministic data masking is a very straightforward approach that simply takes certain values in a data set and replaces them with a different, predetermined value. For example, all appearances of the name “Marie” in a demographic data set could be set to change to the name “Maxine.” This change would be applied to every appearance of “Marie” in the data set.
The simplicity of this model is both its greatest asset and weakness. While easy to apply, the “code” is simple enough to potentially fall victim to reverse-engineering, making it a potential target for attackers and increasing the risk of a data leak.
When to Use Deterministic Data Masking
This form of masking is often praised for its consistency. Since values are masked consistently across data stacks, users can expect their data to be masked and protected whenever they require access. It is also celebrated for its simplicity, but we’ve noted that this can also be a weakness.
Deterministic data masking may be helpful to use in limited cases where data is not at an increased risk of breach. Regardless of the level of risk, however, it will still remain easier to reverse-engineer than more dynamic approaches to masking. Even though deterministic masking can be applied simply and logically, it alone may not be the best choice for protecting your sensitive data.
Real-Time (“On-the-Fly”) Data Masking
Real-time data masking takes a very conservative approach to protecting data. Instead of masking entire data sets while accessing or copying them like DDM and SDM, this method pulls and masks data ad hoc as it is requested by applications or users. It is applied when data is sent between production and testing or development environments, and requires much less storage than methods like SDM that require whole copies to be made.
This method is inherently more secure than deterministic and static data masking, as data is masked case-by-case to ensure protection and compliance.
When to Use Real-Time Data Masking
Given its ad hoc nature, real-time data masking has a similar flexibility to dynamic data masking. As implied by the name, it is done “on-the-fly,” which means it does not rely on systems to be set up and applied passively. This could be operational for a smaller data ecosystem that facilitates a variety of data use cases.
However, the drawback of real-time data masking’s ad hoc nature is that it is inherently less scalable than a dynamic form. Masking data on a case-by-case basis might work on a small scale, but as an organization grows and evolves its data stack, number of data sources, and number of data users, it becomes increasingly difficult and time-consuming to mask on-the-fly.
Which is the Best Type of Data Masking?
Given what we know about each of these types of data masking, each has their own individual merits. There are certain use cases in which static, deterministic, or real-time data masking might be the most effective technique for an organization to apply to its sensitive data.
However, there is only one type that has the scalability and flexibility to adapt to an organization’s growth and changes. Dynamic data masking can support data masking techniques ranging from k-anonymization to differential privacy and beyond, putting many effective masking tools at an organization’s disposal. These techniques, when enforced automatically at query time across the entirety of a data environment, ensure that sensitive data is protected against leaks or breaches at all times. And all this is done while maintaining a single source of truth for data, and without negatively affecting time-to-access or efficiency.
Immuta’s data access platform provides organizations with a holistic, consistent, and scalable model that automatically applies dynamic data masking to protect sensitive data. With room to adapt, our platform ensures that your data can be masked dynamically now and into the future.
To see how creating a policy in Immuta enforces data masking at scale, try our new hands-on walkthrough demo.