Over the past year we’ve seen our larger enterprise customers across regulated industries start to utilize the power of the cloud to accelerate their data science programs. These users are drawn to the cloud’s elastic nature, along with the ability to quickly experiment with powerful new storage, compute services and machine learning frameworks to drive better business outcomes. With the major cloud players rapidly introducing new innovations, plus the flexibility of cloud infrastructure, it’s easy to see why enterprises are increasingly drawn to training their models in the cloud.
With all of the perceived benefits of cloud-based data science, there are three key areas where we see enterprises struggle:
- Time – Data required for training models typically needs to be moved and copied from on-premises to the cloud. Decisions on where that data should land and the compliance concerns surrounding the move can be complex.
- Risk – Your data living on-premises had time to “marinate,” and custom apps were built around each data source to preserve legal and compliance controls. Those legacy apps aren’t moving to the cloud – that would defeat the purpose of moving to the cloud. However, your data must move, leaving it with zero to minimal controls.
- Money – Data policy enforcement needs to be done manually, by creating “anonymized” copies of data sets, across various user types. The more user and policy combinations you have, the more copies you need to make, store, and manage in the cloud. We’ve seen this balloon the costs for training AI models by 10x, eliminating any of the perceived benefits of utilizing the cloud for data science.
We worked closely with our enterprise users to help them overcome these challenges in our latest 2.3 release.
Immuta has always solved the above three problem areas by enforcing dynamic policy controls on databases, Hadoop, and file systems. Providing a consistent control plane that allows data scientists, legal & compliance, and data owners to work together in a frictionless, symbiotic environment which fosters sharing.
So what have we enhanced in our 2.3 release?
One of the primary features is our support for batch workloads on AWS using EMR and S3. This is a major benefit to our customers because they can now literally “dump” their data in S3 and enforce fine-grained controls on that data when running transient compute using EMR. This can reduce storage costs significantly – typically creating 60% savings or more.
Solving for Time, Risk and Money
Going back to our three struggle areas, here’s how our new 2.3 release directly addresses each:
- Time – Data can be moved to the cloud quickly, thanks to the understanding there’s a broader plan on how to control access controls consistently through Immuta. S3 also provides a landing zone that is easily accessible through multiple AWS services beyond EMR, which Immuta also supports (such as API, Redshift Spectrum, Athena).
- Risk – Complex data controls can be built in Immuta consistently no matter the storage platform or processing engine. Immuta simplifies the entire risk management process so that risk can be thoroughly understood, and easily controlled.
- Money – By not requiring new copies of data for every policy/user combination, and by allowing multi-tenancy in your compute engines, Immuta can significantly reduce costs by applying policies dynamically during compute.
You can find a video of this feature here:
Additional 2.3 Features to Streamline Data Science
Our 2.3 release includes additional features to aid our customers with streamlining their data operations:
Advanced Project Collaboration: We often see scenarios where several different analysts wish to collaborate with one another, yet have no way to do so because they all see different data due to variations in their access levels. Immuta can now solve this problem by dynamically “equalizing” all users to the same level from within a project, providing data consistency for the project collaborators. That means all users see the same data within a project they’re working under, consistent with all the policies and rules on that data.
Watch this feature in action here:
Complex policy logic enhancements: Immuta already supports a broad range of complex policy enforcement controls and we always strive to add to our inventory and expand our flexibility. We continued to do so this release through:
Additional complex conditions on when policies should be applied or not.
New support for conditional masking / cell-level security. With this feature, users now have the ability to drive masking of a column based on the value in another column in that same row.
Support for variation of the same policy action. In 2.3, for example, Immuta can now apply policies like “mask Column 1 using regex for members of group A, otherwise mask Column 1 using hashing for everyone else.”
View these complex policy types in action here:
Support for Multiple Approving Parties: For entitlement workflows to data that require manual approval, Immuta policies can now enforce more than one approver and allow for specific users to approve within a permission type. In practice, this makes it easy for users to only be able to access a data source if they’ve been approved by, say, the data owner and a member of the group “Compliance Personnel,” as many of our customers require.
Bulk Action Requests: Immuta now enables data owners to expose data sources in bulk from a database connection, empowering users to disable, delete and restore data sources in bulk.
Our new 2.3 features further streamline data operations – whether on-premises or when migrating to the cloud – and will free your data scientists, legal and compliance personnel, and data owners to work together in a frictionless environment that empowers innovation. Overall, we’re confident that our 2.3 release will empower our customers to take advantage of their most important asset: their data.