What Is Data Masking?
Before diving into best practices for data masking, it’s integral to answer the question: what is data masking?
Data masking is a form of data access control that alters existing data in a data set to make a fake–but ultimately convincing–version of it. This allows sensitive data like social security numbers, credit card information, health data, and more, to be stored, transferred, and analyzed while remaining protected from leaks to potential attackers.
Data masking is a privacy-enhancing technology (PET) that can take a variety of forms and be applied through differing methodologies. Static data masking (SDM) alters data at rest, while dynamic data masking (DDM) occurs while data is streamed from its source to an analysis environment. Methods like k-anonymization, differential privacy, and encryption are all methods for achieving the desired protective effects.
So, why does this matter?
Why Is Data Masking Essential?
As we’ve noted, data masking is an important protective tool in an organization’s data stack. It provides the ability to proactively alter sensitive data that has been collected and stored in order to protect against any potential breach or leakage.
Another primary benefit of data masking is its role in keeping up with compliance and regulations. Nearly every modern organization is subject to the multitudinous data privacy and security regulations effective today. Whether internal, contractual, or government-enforced, these rules and regulations are only increasing in number and relevance. When proactive steps like data masking are taken to protect data at the source, it is much easier to achieve and prove compliance with these rules.
What Are Data Masking Best Practices?
Data masking’s main purpose, then, is to help guarantee sensitive data security without inhibiting or compromising its accessibility. And although various types of data masking and data masking techniques exist, there are certain best practices that all organizations should follow in the pursuit of safe and effective masking.
Identify Your Sensitive Data
In order for masking to be effective, it’s integral to understand what data exists in your storage and analysis environments. To choose the proper masking type and technique, you need to know what you’re masking. Is it credit card numbers, addresses, or BMI data in a healthcare system’s data set? Each of these can be masked in ways that guarantee their protection and proper compliance with the relevant laws and regulations.
The easiest way to maintain consistent, up-to-date knowledge of your data is to facilitate sensitive data discovery and classification as data is introduced to your data stack. This gives data teams visibility and control over the type of data in their possession, and where it is being stored and analyzed. Teams can then better understand their data in the context of the regulations they are subject to, as well as the users who need to access sensitive data. Aggregating this information helps determine the who/what/where/when/why of the masking.
Consider Referential Integrity
Referential integrity means that two or more tables can be joined on a common column or set of columns because the data in both sets match.
In some cases, you may want to preserve referential integrity even when data is masked. In other cases, you may want referential integrity destroyed in order to block “toxic” combinations of data that could result in privacy leaks. Masking techniques such as hashing and reversible masking provide the ability through salting and encryption keys to retain or destroy referential integrity. If this is done dynamically using DDM, it can be very powerful.
Consider Governance and its Costs
Compliance frameworks and regulation – such as GDPR, CCPA, HIPAA – may govern the handling of specific categories of information, placing restrictions on the processing and dissemination of data. It is therefore necessary to understand any applicable governance requirements.
This is important not only because frameworks often suggest or dictate masking approaches for governed categories, but also because the masking of select elements may lower the operational classification of data processing activity, thereby reducing compliance burden or allowing for broader sharing. In such cases, costly processes such as review and audit may be reduced or eliminated, lowering operational costs and time to value, and increasing the data’s overall availability.
Ensure Repeatability and the Ability to Scale
One could argue that this is the most essential part of creating a lasting data masking standard for your organization. Data masking should be viewed as a long-term solution to protecting your data from breach, so solutions should therefore be implemented only if they have long-term potential.
The foundations of any masking standard should be built in a way that allows for repeatability and scaling. Masking techniques should be applicable to any new data in perpetuity, without needing to be overhauled or greatly adjusted. As data evolves and multiplies, the techniques used to protect it must be able to keep up. This means that masking techniques should be chosen and implemented only if they can be successful for your data needs both now and in the future.
Data Masking That Can Facilitate Best Practices
In short, organizations should build a data masking standard that facilitates sensitive data discovery, can maintain referential integrity among distinct data sources when necessary, considers the role of governance, and can be repeated at scale. These best practices, while distinct from one another, may be easier to achieve than you think.
Immuta’s Data Security Platform automatically implements sensitive data discovery and classification as new data is introduced into an environment, giving users the information they need to know about their data. The platform supports a variety of important dynamic data masking techniques, which can be applied automatically at query time through attribute-based access control policies. Dynamic policy enforcement means there is never the need to copy or manually mask data in the original sets. This ensures that the original data can remain referenceable, and mitigates irreversibility since the masking algorithms don’t live in the same place as the data. Most importantly, Immuta’s separation of policy from platform guarantees repeatability, meaning masking techniques will be applicable to all data as you grow and scale your data sources.
Want to experience how Immuta’s policies enable powerful and effective data masking? Try out our new self-guided walkthrough demo.