Immuta Spring ‘19 Release: Automating Compliant Data Collaboration

Data analytics rarely occurs in a vacuum – analysts work together sharing scripts, dashboards, and models as they attempt to extract value from their organization’s most critical business asset: data.

A common misstep is that most controls are limited to protecting the raw data, and fail to consider the data security and privacy implications that relate to data at use – leaving analysts and data scientists struggling to work together with confidence that they are handling sensitive data correctly.

Born out of concepts learned in the U.S. Intelligence Community, Immuta does not stop at protecting the raw data—powerful workflows and abstractions provided by Immuta allow for compliant collaboration within the most scrutinized business environments.

In our latest release, we’ve introduce the following new compliant collaboration features:

NEW FEATURE

Automated Policy Inheritance

Creating new data from existing protected data is a compliance risk. What protections should the new data have? Who can see it? Immuta solves this problem by passing on the policies from the original sources.

When analysts work with data, it commonly results in a new data source that reflects the work that they accomplished and now wish to share. Considering highly variant policies and access controls, the challenge becomes sharing these outputs in a compliant way.

This typically requires the analyst to undertake a manual process with their legal and data engineering teams to appropriately inherit policies and enforce them for every output they create. This is not a scalable solution due to the level of manual intervention required, and we often see organizations give up, and ultimately prohibit this kind of data sharing.

Immuta is excited to announce our technique for Policy Inheritance automation which solves this challenge. Using Equalized Projects, introduced in v2.3, the Immuta software can understand not only what data sources could have been used to create the new data sources, but also what policies were in place across all users in the project when they were created.  This allows Immuta to dynamically apply and enforce the appropriate policies from all upstream data sources. With this technique, analysts can expose their new data sources back in Immuta, without involving legal or IT, and automatically inherit the appropriate policies from the parent data sources. These inherited policies are not just restricted to the project space, but also work outside the project, which allows you to a) share data faster, and b) remove any duplication of effort since the output can be shared rather than having to be recreated over and over again.

Check out Automated Policy Inheritance in action here:

NEW FEATURE

Re-Identification Requests: On-Demand De-Masking of Sensitive Data

Immuta makes it easy for users to request access to data, request to have sensitive values unmasked, and propose new projects. These built-in workflows remove the manual, ad-hoc approval flows that typically take months of meetings and decisions.

“With Immuta’s new re-identification feature, it is now possible for our data analysts to identify outliers in anonymized datasets, and start a well-controlled workflow to mark those specific samples to an authorized admin for de-anonymized investigation in a way that fully conforms with health industry data-privacy guidelines.”

— Halim Abbas, Chief AI Officer, Cognoa

Imagine an analyst works with hospital patient data to predict the risk of an imminent heart attack. The analyst has data for thousands of patients, and appropriately, all of the PII is masked from the analyst.  The analyst finds three critical patients who seem to be at very high risk and now needs to alert the patients – but how, their identifiers are masked? Re-identifying the patient is challenging because the analyst must:

  • Hand-off the analysis script or process to a user who has access to patient PII.
  • This user would then re-run the entire analysis with the PII exposed to find the three outliers (but also see all the other patients).
  • Alternatively, the hospital would need to store the patients’ data with an ID scheme, which would allow the patient to be identified without revealing PII. This is a good approach but is rarely done, as we find that many organizations reference individuals using their PII.

In our latest release, we’ve introduced two new masking types that allow reversing, as well as a workflow for reversing those masks for the purpose of Re-Identification. Focusing on the masking types first, Immuta has introduced:

  • Reversible Masking
  • Format Preserving Masking

Immuta defines “Masking” as a technique to make data in a column similar but inauthentic in some way so there’s a level of utility loss for privacy gain.  How similar the data looks is what is determined by the masking type chosen, this allows the policy author to make tradeoff decisions between data utility and privacy.

With both Reversible Masking and Format-Preserving Masking, the raw values are switched out with consistent values, for example, with Reversible Masking, ‘test’ is always replaced with ‘apoi0293rpjadofina=’, to allow analysis without revealing the underlying sensitive data by replacing the direct identifier with a token that can still be tracked, counted, etc. Format-Preserving Masking takes this a step further by replacing the sensitive data in a way that preserves the data format.  This can be important if the format has some relevance to the analysis at hand, for example, if you need to retain the integer column type or if the first 6 digits of a 12 digit number have an important meaning.  

Should the analysis require eventual re-identification of a record, you can choose one of these two masking types.  When choosing the masking type, Immuta will require you include users whom the policy will not apply – those are the users that will be able to service unmask requests (aka re-identification).  Then, as the analyst reaches conclusions, they can request through Immuta the values of interest be unmasked, which is done by an approved user, the unmasker, and happens dynamically based on the algorithm – no data is stored in Immuta.  Note that it is possible for the unmasker to take action with the unmasked value, never sharing it back with the requesting analyst. The request to have the data unmasked, and the execution of the unmasking request, are both fully audited in Immuta.

Watch Format-Preserving Masking and Reversible Masking in action:

NEW FEATURE

Fingerprints: Capture the Impact of Data Policy Changes on Downstream Data Users

Capture snapshots of what your data looks like at specific points in time. These summary statistics allow you to understand differences in data over time, and how the policy changes that impact them that may affect output from your projects.

As our Chief Data Scientist says, “your model will perform best the first day you deploy it and will get progressively worse over time”.  Why is that? Put simply, the world changes, data changes, but also with our current regulatory environment, data policies change.

Similar to the notion of creating new data sources, analysts also create “analytical outputs”.  Analytical outputs can be things like a machine learning model, a BI dashboard, or something as simple as an analysis summary in a Word document.  It can be anything derived from your raw data. Immuta does not manage the creation of those outputs, however, when the outputs are created, it can be a critical “checkpoint” in time for your organization.  

Those changes can have a large impact on the decisions that were made in the past or decisions that are being made by models in real-time. Without Immuta, organizations struggle to understand what data went into which model, but taking that a step further, it’s even harder to understand how the data that went into those models is changing.  

To solve this problem, Immuta now allows analysts to create data checkpoints, or what we call Fingerprint Versions.  These are summary statistics about data at a critical point in time, such as when you expose one of these analytical outputs to the world.  Those Fingerprint Versions are specific to a project and understand the existing policies on the data when they are captured. Should a policy change, the analyst can inspect that fingerprint version and immediately understand the statistical shift in the data, as well as the difference between the legacy and current data policies.  Imagine a policy changed from Format Preserving Masking a column to simply making all values in a column NULL – that would have a large impact on your downstream analysis. This can help diagnose problems in models or help manage when new analytical output is required to remain compliant, accurate, or both.

Watch Fingerprints in action here:  

SEE IT IN ACTION!

Compliant Data Collaboration Webinar

Join me on Wednesday, May 15 at 2:00 p.m. ET for a live demo of our platform and the new features announced today.

You can register for the webinar here.

***

Steve Touw is the co-founder and CTO of Immuta. He has a long history of designing large-scale geotemporal analytics across the U.S. intelligence community — including some of the very first Hadoop analytics, and frameworks to manage complex multi-tenant data policy controls. Previously, Steve was the CTO of 42Six Solutions (acquired by Computer Sciences Corporation), where he led a large Big Data services engineering team.