We heard from several of our customers that managing large shared metastores can be challenging, primarily because users have access to all data stored in a cluster’s managed tables. Table ACLs address part of this challenge by protecting tables for SQL and Python after enabling table access control for the cluster.
But with R and Scala notebooks, it becomes difficult to manage DDL operations, such as ALTER, CREATE, DELETE, UPDATE, DROP statements, within a cluster where table access controls are either not supported or enabled.
Databricks DDL operations control using Immuta
Immuta offers Databricks access control for R and Scala. While our automated security and privacy controls supports Databricks environments where all rules are dynamically enforced on Spark jobs, the focus of this article is on table access controls to manage Databricks DDL operations control.
Our customers have raised the need to protect against DDL operations that modify tables for different users, such as data scientists working in R or data engineers working in Scala across a standard Databricks cluster. To illustrate, I created a table “sumit.boxes” last week but later accidentally deleted “default.boxes” created by our CTO working on a demo in the same cluster without Immuta enabled. Ok is not actually ok. [expletive]
While our customers are definitely smarter than me, it’s important for them to have a data platform experience in which rules are enforceable and auditable, especially in sensitive data environments.
How to control create/drop/alter operations in Databricks using Immuta
Immuta’s fine-grained access control for R and Scala notebooks can work on any cluster, building upon existing support for SQL and Python, but not requiring table access control to be enabled on a cluster.
Here are the high level steps to protect against unintended create/drop/alter operations:
- Configure Immuta for your Databricks cluster. To get started, you can review the installation guide for details and prerequisites.
- Register the table(s) you want to expose to that cluster. This is a virtual reference, so no data is actually moved to Immuta.
With these capabilities now configured, when I try to drop a table in the wrong database, I’ll get an error based on Immuta’s access controls. By default, these protections deny all DDL operations to alter a table or its data.
Error in SQL statement: AnalysisException: [email protected] is unable to perform this operation on the database default outside of Immuta workspaces. The user does not have a current project set in Immuta, which is required to access a workspace database;; DropTableCommand `default`.`boxes`, false, false, false
But since I’m working within the Immuta enabled cluster, I am able to create a transformation and fetch data from the new table with Immuta’s access controls using an R notebook (which is pretty cool).
How to enable create/drop/alter permissions
The new secure data collaboration capability in Immuta uses the concept of “Immuta Projects” to manage WRITE operations transparently in Databricks clusters.
Here are the high level steps to safely permit create/drop/alter operations:
- Create a native Databricks workspace by specifying the storage layer either on AWS or Azure, which I have named “mydbws.
- Create an “Immuta Project,” which provides a safe collaboration space in which the data platform admin role can grant access to specific data sets — with data policies applied — and enforce rules for how members of that project collaborate on available data. To do this, click on the “Project” icon on the left menu and then “+ New Project.” Specify the purpose for use (in my case, HR analytics) and decide if you want members to acknowledge their intended use for auditing purposes (in my case, I do).
- From Databricks, all users who are members of the project can now work safely together in the cluster to create and modify tables in the native workspaces managed by Immuta.
Get started protecting all of your clusters
Beyond managing Databricks DDL operations control, Immuta enables teams to manage fine-grained access controls across different users with further protections against data leaks when users write data to a given cluster where less privileged users can view it.
If you’re interested in learning more about safe data collaboration in Databricks clusters, as well as automated data-level security and privacy controls, our team would love to walk you through the capabilities that work transparently with Databricks. Request a demo to talk to us about what you can do with Immuta and Databricks.