Secure Data Collaboration, Privacy, and Compliance for Databricks

Databricks shops are expanding their analytical use cases, and, in the process, bringing more sensitive data and more data scientists to the data platform. The rules governing what each data scientist is permitted to see are becoming increasingly complex, with CCPA, HIPAA and other privacy regulations plus internal rules. How can you apply fine-grained access controls? How do you implement and prove regulatory compliance? How can you prevent data leaks in these complex environments?

At Immuta, we are seeing an influx of Databricks customers who need to automate data security and privacy controls to scale cloud analytics and data science initiatives. Our latest release – launching in a few weeks – is focused on addressing these challenges.

Specifically, this new release includes:

  • Secure Data Collaboration – make life easier for data engineers managing data-level zones for read/write access across users with different permissions.
  • Regulatory policies for CCPA and HIPAA Safe Harbor – decrease risk of fines, breaches or unwanted news headlines by automating the manual steps required to enforce compliant data access.
  • Advanced privacy – randomized response, the latest innovation in our suite of dynamic Privacy Enhancing Technologies (PETs) building on k-anonymization, protects against attacks that can draw damaging conclusions about individuals, while preserving data utility.

 

Secure Data Collaboration

Data engineers build ML data pipelines with a series of steps, from raw ingestion, to refined data, to aggregated data ready for use. The potential for leaks in this pipeline process exists when users with different permissions, or fine-grained access controls, are producing new tables such as feature stores or transformations across a Databricks cluster.

Our upcoming release feature’s Secure Data Collaboration, a new feature based on Immuta’s patented approach to equalizing permissions to read and write data across the pipeline to prevent data leakage. This feature uses Projects in Immuta, which dynamically enforce equalized permissions across users on Spark jobs in Databricks — and thereby protecting data platforms from leakage.

https://www.immuta.com/wp-content/uploads/2020/06/projects2@2x.png

 

Regulatory policies for CCPA and HIPAA

How do you implement rules and prove regulatory compliance when using personal or sensitive data for analytics or ML/AI projects? Our customers’ data teams are increasingly being tasked with manually implementing rules and regulations that carry significant risk of fines, embarrassing breaches, or even gross negligence that can make individuals personally liable.

Immuta’s new Starter Policies – available for the first time in our upcoming release — simplify compliance with privacy regulations like CCPA and HIPAA’s Safe Harbor. These templated policies enable data teams to quickly automate the many manual steps involved in preparing data for compliant analysis. With Immuta, you can now automatically discover sensitive data subject to regulation, apply Starter Policies, customize them to your needs, and prove compliance using detailed audit logs and consent workflows. Get the full transparency required by your compliance and legal stakeholders so you can do more with Databricks, even with sensitive data.

https://www.immuta.com/wp-content/uploads/2020/06/CCPA-Global-Starter-Policy-1.png

 

Advanced privacy

Part of the journey to regulatory compliance for data teams is providing proper de-identification for PII or ePHI data. But how can you further prevent leaking sensitive information that attackers might use to draw damaging conclusions about individuals? Privacy-enhanced analysis of sensitive data is a notoriously complex topic, often involving the risky and time-consuming implementation of difficult-to-understand-and-debug statistical techniques and complex ETL processes.

Our upcoming release features Immuta’s newest privacy enhancing technology, randomized response, which helps you achieve local differential privacy for specific columns containing information sensitive to individuals. Local differential privacy, like its close relative of differential privacy, places mathematically guaranteed limits on an attacker’s ability to exploit the data in attempts to draw harmful or embarrassing conclusions about individual data subjects. Our CTO Steve Touw discusses this new technology at Spark & AI Summit, using a real-world example: how much celebrities tip using the public NYC Taxi data set.

https://www.immuta.com/wp-content/uploads/2020/06/direct-identifiers-indirect-identifiers-sensitive-data@2x-1.png

Get Started with Immuta for Databricks: New Free Trial

In addition to all these great new capabilities, we’re also launching something entirely new this week: a 14-day free trial. Time and time again, our customers tell us that seeing is believing with Immuta — you really have to try it to understand how automation can redefine your approach to security, privacy and access control. So, give it a try. If you’re new, be sure to take the built-in guided tour of our core capabilities. And if you’re a Databricks shop, the trial now includes a guided configuration for your AWS or Azure environment. We’re also available to help out on live chat. That reminds me that I need to get my chat account setup because I would love to hear from you!

Want To See For Yourself?

Start a Free Trial Today

Free Trial
Blog

Related stories