What Is Data Masking?
Why Do I Need Data Masking?
What Are Some Data Masking Techniques?
When Do I Need Data Masking? Top Use Cases
How Do I Apply Dynamic Data Masking?
What Is Data Masking?
Data masking, also referred to as obfuscation, is a form of data access control that alters existing sensitive information in a data set to make a fake–but still convincing–version of it. This allows sensitive data to be stored and accessed, while maintaining the anonymity and safety of the information involved.
Data masking comprises a family of obfuscation techniques for controlling information disclosure. The choice of specific technique often depends upon the intended application. In determining which technique to apply, one must consider:
- Data Disclosure Risks: What fields and portions of values are sensitive? Who are the downstream recipients? What could an attacker infer from this information? Who may be harmed by the release of this information and how?
- Analytical Use Cases vs Masking Characteristics: Are downstream recipients and processes sensitive to formatting? Is there any information – such as the portion
of a credit card number that encodes the issuing bank – that must be preserved downstream? - Governance: Do compliance and/or regulatory frameworks such as GDPR, CCPA, and HIPAA apply? What restrictions, if any, do these frameworks place on the use or release of the data? Can masking lower the operational classification of data processing activity, thereby reducing compliance burden or allowing for broader sharing? What masking methods are acceptable for the specified application?
Why Do I Need Data Masking?
Privacy-enhancing technologies (PETs) are increasingly important parts of any modern data stack. Data masking specifically is one of the most popular PETs, and can take various forms.
The continued rise of data use and sharing in business and government also increases the risk of data breaches and leaks. According to the Identity Theft Resource Center (ITRC), 83% of the 1,862 data breaches in 2021 involved sensitive data. In an age where data breaches can impact organizations as large and secure as Facebook and LinkedIn, it is crucial that companies incorporate masking techniques into their data storage capabilities to maintain consumer safety and trust.
Not only will masking sensitive data help your organization protect consumers’ privacy, but it will also help maintain a level of trust between the company and the consumer. When everyone understands the ubiquity of personal data, consumers often implicitly trust that the information they give to organizations will be kept safe and secure. If this trust is betrayed, it can severely damage consumer confidence and relationships.
What Are Some Data Masking Techniques?
There are a number of data masking techniques used to protect sensitive data, but these are some of the most common:
k-Anonymization
This technique is often classified as a “hiding in the crowd” approach. k-Anonymization combines data sets that have similar attributes in order to make the individual identities of members indistinguishable from the data.
Differential Privacy
Differential privacy adds random “noise” into data ecosystems. This creates an environment where the original data is still accessible for use, but obscured enough by the excess information so as not to be attributed to any individuals.
Encryption
Encryption scrambles the values of the data in a data set, requiring a specific key in order to decrypt and restore them to their original form. This locking of data makes it unusable without the key, effectively protecting against an unauthorized breach.
Tokenization
Similar to encryption, tokenization takes specific data values and replaces them with a “token” that has no extrinsic value or meaning. The key that reveals the meaning of the token remains separated from the data set by firewalls, keeping the sensitive values indistinguishable for anyone without the proper authorization.
Nulling/Redaction
While both distinct forms of data masking, these take very similar approaches. Nulling plainly removes data from a data set based on access permissions, listing those values not permitted to a given user as “null.” Data redaction also removes or substitutes values based on user permissions.
Generalization/Averaging
Another set of similar data masking techniques, data generalization and averaging both replace the actual values in the data set with “zoomed out” values, making it so that no specific PII is attributed to specific individuals. An example could be replacing specific heights in a medical data set with height ranges.
When Do I Need Data Masking? Top Use Cases
While data masking is used for many distinct use cases across industries, here are a few of the most common scenarios:
Internal & External Data Sharing
Sharing data, whether between teams at the same organization, departments at different companies, or governments across borders, opens up avenues for data leaks or breaches. As data sharing and subsequent leaks involving sensitive data increase, preemptive measures must be taken to protect consumers from the repercussions of re-identification.
Beyond the desire to keep data safe, privacy laws and regulations mandating its protection continue to proliferate. Many common data sharing use cases, ranging from third party collaboration to simple internal exchanges, fall under the purview of these laws, contractual obligations, and other protective requirements. Failure to comply with these measures can bring an array of monetary, legal, and reputational consequences against an organization.
Applying data masking is a proactive step towards regulatory compliance and against breaches. By making data masking a best practice in your data workflow, individuals can be protected regardless of where or with whom their data is shared. This allows data teams to mitigate and manage risk, without detrimentally impacting on productivity or data shareability.
Securing Sensitive Financial Information
Financial services institutions are built on trust. Whether its bank account routing numbers or credit card details, individuals’ financial information is extremely sensitive. This information must be adequately protected to maintain consumer trust and safety.
There are a variety of financial regulations that also require the strict protection of this data. The EU’s GDPR takes a broad approach to consumers’ data privacy, including their financial PII. In a more specific fashion, the Payment Card Industry Data Security Standards (PCI DSS) require the protection of credit card-related data across the processing, storage, and transfer stages of transactions. The Gramm-Leach-Bliley Act (GLBA) mandates data security across all U.S. financial institutions. Each of these regulations, as well as many other data security compliance laws, can carry heavy monetary and business penalties if not followed accordingly.
Data masking can guarantee that sensitive financial data is hidden appropriately within FinServ and FinTech data sets. When information like card and account numbers are sufficiently altered, they cannot be related back to individuals in the event of unauthorized access.
Protecting Sensitive Health Information
Another form of extremely sensitive data is protected health information, or PHI. This data is created and collected by healthcare providers and life sciences organizations, and can include anything from medical conditions and history to demographic data and insurance information.
The protection of PHI, similar to financial data, is required by a range of modern regulations. The most popular and well-known of these regulations is the Health Insurance Portability and Accountability Act (HIPAA). This act regulates the protection of virtually all PHI generated by modern healthcare practices, ensuring that no one’s personal information can be violated and abused in the event of breach. Failure to comply with these regulations can result in fines and other penalties.
In this case, PHI can be masked in healthcare data sets so that the information cannot be connected back to individual patients or accounts. This is especially important in the case of personal, demographic, or medical condition data.
[Tip] You can learn more about masking PHI and reducing re-identification risk in this guide.
How Do I Apply Dynamic Data Masking?
With a range of important use cases and regulatory measures requiring strict compliance, data masking is an essential component of a modern data stack. However, today’s data ecosystems are rarely stationary. Data generation continues and new use cases develop, but privacy requirements and the necessity of data masking remains the same. How can this need be met in such a fluid environment?
Immuta’s Data Access platform bridges this gap and provides a future-proof, scalable model for modern organizations. With dynamic data masking capabilities that enforce PETs like automated k-anonymization and differential privacy, Immuta can facilitate masking automatically across diverse data ecosystems and ensure sensitive data’s security without compromising its access and use. Immuta provides the consistent, holistic, and scalable data masking capabilities required for current and future data use.
To examine how policy creation in Immuta enforces data masking at scale, try our new self-guided walkthrough demo.