8 Signs You Need Data Access Control for Databricks

Imagine this —

You’re at the starting line of a road race. The gun goes off and the clock starts ticking. Runners take off — as you bend down to tie your shoes.

You don’t have to be a runner to know that this tactic ends in your competitors leaving you in the dust. You’d avoid this in a recreational setting — so why wouldn’t you in a professional one? Spending too much time preparing data for use stifles its ability to deliver meaningful, timely insights, giving competitors ample opportunity to outperform you. Automated data access gets your data to the starting line faster, accelerating data analytics timelines.

The issue hindering time to data access generally isn’t data teams’ capabilities — it’s the lack of tools they have to implement and simplify data access control. Immuta, the leader in automated data access, integrates with Databricks, the data and AI company, to help customers overcome data access control challenges while maximizing data’s utility and security, so organizations can reap the time and revenue benefits of fast, compliant data access and analysis.

A Guide to Data Access Governance with Immuta and Databricks spells out in detail how exactly this works. But because the issue of scalable, secure data access control is becoming vitally important — particularly as organizations accelerate cloud adoption and scale workloads — identifying the signs your data team may need Immuta for Databricks can maximize time, money, and most importantly, your data’s value.

Here are eight indications it might be time to add automated data access control to your Databricks platform:

1. You’re dealing with role explosion from static access controls.

According to Immuta’s research, 80% of data teams use role-based access control (RBAC) or “all-or-nothing” access control policies to manage who has access to what data. Yet, as more organizations move to being primarily or entirely cloud-based, there will be more data available and more users who want or need to access it.

RBAC’s static approach requires data engineers and architects to create roles for new users or data, which quickly leads to role explosion – and could be why you have more roles than people in your organization. With potentially hundreds or thousands of user roles to manage, it’s difficult to keep track of which roles belong to which access permissions. This can easily lead to inconsistent data access capabilities across platforms. Using RBAC’s outdated approach to access control simply won’t cut it.

Immuta’s native integration with Databricks uses attribute-based access control (ABAC) to grant or restrict access to data based on distinct sets of attributes. Databricks customers report reducing the number of roles in their systems by 100 times when using Immuta’s attribute-based access control.

2. Your access controls are overly restrictive or broad.

Using RBAC or an “all-or-nothing” data access model requires policies to be predetermined up front. This means that data teams leveraging Databricks must remember to preemptively create and/or update policies before new data enters the platform. When trying to ingest large amounts of data, this process can result in overly restrictive or broad access controls.

Overly broad access controls introduce substantial risk for a data breach or leak, which could result in costly fines — averaging $3.9M — as well as personal liability for the data teams involved. On the other hand, overly restrictive access controls may lead data consumers to request access on an individual basis, forcing data engineers to spend time acknowledging, assessing, and acting upon each request. Not only does this slow time to data access and necessitate new roles (see: role explosion), but it also reduces data teams’ productivity and makes an organization’s entire data pipeline less efficient.

Immuta acts as an integrated layer within Databricks that maps attributes to policies and allows or denies access at query time based on context. With Immuta, Databricks customers report a 300% increase in data utilization, largely resulting from the ability to proactively set appropriate, dynamic access controls.

3. You manually implement policies for each platform.

“You’d think when an organization would move to the cloud, they would try to consolidate [databases], but that has not been the case,” Immuta’s CTO and co-founder, Steve Touw, told Datanami. “One data warehouse or one lake isn’t going to solve all your problems, so the data ends up in multiple places and multiple different computes.”

Databricks users are typically not operating in a Databricks vacuum — many have a diverse data stack that continues to expand and evolve. Yet, implementing the same policies in Databricks and across every other independent platform in an organization’s data ecosystem is a drag on data teams’ time and productivity.

Immuta’s global policies can be written and applied to all data sources across an organization, allowing Databricks users to consistently enforce cross-platform data access control. This, combined with scalable, dynamic access controls, has improved data engineering productivity by 40%.

4. You have no way of managing consistency.

Implementing consistent policies for universal cloud compatibility is critical to maintaining data security and privacy, but consistency is a guessing game without a way to monitor and manage it.

Immuta’s native integration provides Databricks users with a centralized plane from which to oversee policies and access controls, which enables data teams to see policies across all data sets. And, because Immuta’s as-code policy builder allows data engineers and architects to author policies in plain English, compliance and security stakeholders can easily understand access controls, thereby improving collaboration and communication amongst teams.

5. You’re not a legal expert or policymaker.

There’s a reason organizations in every industry employ or consult with lawyers: legalese is not easy to understand. And, since most lawyers are not legal engineers, this can be a barrier to secure, trustworthy data access control.

This is particularly true in today’s increasingly regulated environment. Today, rules and regulations are also frequently amended and updated, adding a layer of complexity for data teams. Take, for instance, the CCPA. The act was signed into law in June of 2018 and its amendment, the CPRA, takes effect in January of 2023, less than five years later. This is just one of many changes data engineers and architects must absorb and respond to efficiently to avoid penalties.

Immuta streamlines regulatory compliance by providing Databricks users with starter policies that meet the requirements of HIPAA’s Safe Harbor Policy and the CCPA, helping to safeguard Databricks users from potential penalties.

6. You have no way of preserving original data.

The amount of sensitive data available in today’s environment and the sophistication of modern data analytics platforms means privacy-enhancing technologies (PETs) and dynamic data masking are critical functions in a data access control solution. Without this functionality, organizations are at greater risk of data leaks, breaches, or re-identification.

However, masking data often means fundamentally altering it. This means original data can’t be referenced or leveraged for future use.

Immuta’s dynamic data masking capabilities and PETs — including randomized response, k-anonymization, and differential privacy — protect sensitive data while enabling its utilization, and preserve its original format for non-production use. As a result, Databricks users are able to increase permitted use cases for cloud analytics from 25% without Immuta to 90% with it, simply by safely unlocking sensitive data.

7. You’re expected to provide data access — yesterday.

Time to data access is one of the biggest challenges for data teams. Arduous approval workflows are only compounded by the (often manual) processes of sensitive data discovery, policy creation, and role management. If data utilization takes months, how much does its value depreciate?

According to Gartner’s analysis of data science teams, nearly half of the time spent on data projects is on tasks that take place before even developing models or conducting problem analysis. Considering the number of new and existing data sets available to data teams — and the business-driving insights they can deliver — this ratio is strikingly unbalanced.

For Databricks customers, however, Immuta’s native integration streamlines these time-consuming processes and accelerates time to data access. Databricks users report that Immuta’s ability to provide secure, self-service data access reduces typically months-long processes to mere seconds.

8. You can’t easily scale your data.

As data becomes more critical to driving business initiatives and insights, both data sets and data consumers will continue to grow exponentially. According to Immuta’s research, 75% of organizations are collecting and using or planning to use sensitive data for analytics. But as these numbers grow, scaling data access and utilization becomes unmanageable — without the right tools.

Role explosion, static access controls, and platform-by-platform policy implementation are among the factors that can inhibit data scalability. As the numbers of data sets, consumers, and regulations scale, the risk to data security and privacy scales as well.

Databricks customers like WorldQuant Predictive have leveraged Immuta’s global policies and automated, dynamic access controls to scale data — even the most sensitive data — to rapidly growing populations of data consumers. By streamlining roles, enabling global policy creation, and applying dynamic access controls, Immuta allows Databricks users to reduce risk and time in scaling sensitive data use.

Databricks and Immuta seamlessly implement automated data access control in a best-of-breed data analytics platform, empowering data teams to do more with their data. To learn more about Immuta’s native integration with Databricks, download A Guide to Data Access Governance with Immuta and Databricks.

To get up and running with Immuta for Databricks faster than ever, start a free trial today.

Blog

Related stories