Article

New in 2021.3: Performance & Scale for Secure Workloads in Databricks & More

At Immuta, we have a culture of innovation and continuous improvement. We listen to our customers’ pressing data access control, privacy, and governance challenges, then build new features and improve on existing ones to help overcome those challenges.

Our latest release introduces new features and enhancements that help accelerate our customers’ cloud journeys, focusing primarily on performance and scalability, usability, and consistency for data access management in modern cloud stacks. In this blog, we’ll go into detail on each to help you unlock even more value from your data.

Databricks Innovations and Enhancements

Improved Usability

Simplified Databricks Configuration
You can now configure and test Immuta’s integration with Databricks with the click of a button. There’s no need to struggle with manual, error-prone configuration processes, so you can save time, reduce risk, and get up and running with Databricks and Immuta as quickly as possible.

Struct Data Type Support
For organizations looking to handle complex data types at scale, Immuta has added policy support for Databricks structs. This means you can mask values within structs, as well as drive row-level security based on field values in structs.

Performance and Scalability

Smart Mask Ordering in Spark
The order of masking pipeline steps can significantly impact performance working on large data sets. For example, simply creating a view that hashes a column requires all the values in that column to be hashed before other operations, such as joining, can be performed.

Immuta has implemented complex logic that optimizes performance based on the types of queries and the masking techniques. By holding masking policy implementation as long as possible, column masking overhead is significantly reduced.

To understand the impact of this innovation in practice, consider a performance test we recently ran using smart mask ordering.

Results from this performance innovation:
Query: select * from synthetic_crime_data a join synthetic_crime_data b on a.address = b.address and a.area_id = b.area_id and a.area_name = b.area_name and a.crime_code = b.crime_code limit 5;
Masked columns: address, area_id, area_name, crime_code masked by sha256 hashing with salt

Improved Performance Testing
Performance testing the Immuta integration with Databricks is now a breeze, with our TPC-DS performance benchmark notebook that ships with our configuration.

Additional Data Access Control Enhancements

Usability

ADLS Gen 2 Support
With this release, users can enforce subscription policies on ADLS Gen 2 data sources. While most customers enforce fine-grained access control policies at the compute layer, there are scenarios where access control to binary objects, such as images stored in ADLS Gen 2, may be required.

Consistency

Multiple Native Integrations
Users can configure unlimited integrations for each type of native access pattern, in a single instance of Immuta. This capability spans Amazon Redshift, Azure Synapse Analytics, Databricks SQL Analytics,Trino, Starburst, and Snowflake – meaning that no matter how complex the enterprise data platform architecture, Immuta can provide consistent data access control across all data platforms and instances.

Next Steps

Most organizations are struggling with cloud data management, driven by disparate access controls across platforms, rule complexity limiting available data for use, and explosion in the numbers and types of data users — all of which increases time-to-value from data.

Immuta’s unique approach to cloud data access control means teams can reduce policy burden by 75x, simplify cross-platform policy management, and enable organization-wide adoption – free of time- and labor-intensive manual processes.

If you are new to Immuta, there’s no better way to get a high-level overview of key capabilities than by booking a capabilities briefing with our team.

Ready to get started?

Request a Demo