Immuta Data Blueprints

Step-by-step guides for implementing dynamic data security for data science and analytics use cases.

To stay ahead in today’s data-driven world, organizations in every sector must have the ability to share data — including sensitive data — both internally and externally, without violating an ever-changing body of data use rules and regulations.

Even federal government agencies are not exempt from the need for sharing data — or the challenges it often poses for data engineering and operations teams. In fact, the public sector is perhaps more reliant than commercial entities on data sharing, since a failure to efficiently collaborate on data could have wide ranging implications on everything from national security to public health and legislation. Deputy Security of Defense Kathleen Hicks was recently quoted saying, “Leaders at all levels have a responsibility to manage, understand, and responsibly share and protect data in support of our shared mission.”

Traditional processes delay time to data access, which, according to five government agencies we surveyed, takes an average of two and a half months. As data use changes — moving away from manual processes, siloed data use, and structured data, and towards automation, collaboration, and ad hoc analysis — data cultures and tools are evolving as well. To reap the benefits of best-of-breed technologies without sacrificing privacy or utility, organizations must invest in cloud data platforms that are compatible and able to automate critical parts of the data pipeline.

Simplifying Data Access & Control

Immuta’s native integration with Starburst accelerates data use by simplifying data access and control. This is achieved through the separation of policy and platform: Dynamic policies are created in Immuta, which hooks into Starburst so that Starburst can automatically federate queries to all other connected platforms. Data governance teams save time creating policies without having to write code, data engineers can easily apply the rules across platforms like Starburst, legal and compliance teams have full transparency into data policies’ contents, and data users benefit from self-service data access.

To illustrate how this works in practice, let’s walk through an example of how to enable secure data sharing of highly sensitive government data.

How to Implement Attribute-Based Access Control in Starburst

Once the data sources in your Starburst cluster have been registered with Immuta and tagged for direct identifiers, indirect identifiers, and sensitive attributes, the first step in preparing data to be shared and used collaboratively is to create a subscription policy using attribute-based access control.

In this case, we have a set of foundational intelligence global points of interest that need to be automatically available to new users with the right permissions. Automating this process will prevent the need for manual role updates each time a user is added or removed, an approach which inevitably leads to unmanageable role explosion.

Through the Immuta console, you navigate to the Policies tab to create a Global Subscription Policy. This type of policy works across data sources and is driven by tags and attributes that tell you the kind of information the data source contains, the function it performs, and the users that might need to access it.

In this example, you need to make this data source available to users in the Intel Analysts group. Since membership in this specific group is considered an attribute of the people accessing the data, you can build the policy around it, as shown below.

During this step, you’ll also select where the policy should be applied — in this case, on data sources tagged as Intelligence Class.Foundational, meaning the data contains foundational intelligence.

It’s important at this step to ensure that the appropriate users are assigned the right attributes — in this case, the Intel Analysts group. By applying this subscription to just Intel Analysts, members of that group are automatically subscribed to the data and able to query it, while those not enlisted in the group are not.

Now, let’s shift from data access capabilities to how you can control secure data sharing.

How to Set Up a Project for Starburst Data Collaboration

One of the key challenges data engineering and operations teams face is how to ensure that data consumers with varying access control attributes can work collaboratively on data sets without inadvertently exposing unauthorized data. To achieve this, data teams often adhere to the principle of least privilege, which posits that any user or system should have the minimum privileges to data access necessary to perform a function or task. In practice, this conservative approach helps mitigate the risk of a user having unauthorized access to data.

Immuta Projects achieves the principle of least privilege by equalizing data access across a specific project, effectively creating a Starburst clean data room for secure collaboration. When creating a Project, you can select the data sources that should populate the Project, as well as corresponding purposes and tags. For instance, you can add the data source from earlier — complete with subscription policies — to a collaborative Project space and add Counter Intelligence as one of the purposes for using the data sets.

To build on our earlier example, let’s assume that a small group of Intel Analysts are given an assignment to analyze data from various types of locations in a given Area of Responsibility (AOR), in this case Mali. However, each of the analysts in this group has attribute-based access rights for different AORs. For instance, Analyst A might have access to data on Mali and Tanzania, but Analyst B is only authorized to see data on Mali; therefore, when equalizing access rights for the Project, information on Tanzania must be hidden.

To do this, you create a new masking policy in the Data Source tab of your Immuta console that only shows rows where the user possesses an attribute in AOR Countries that matches the value in the column Country.

Now, queries will only show data for countries that match a user’s approved AOR countries attribute. When Analyst B, who can only see records containing Mali in the Country column, runs a query, it will return the following results:

When this data source is added to a Project that includes Analyst B, Analyst A will also see only records containing information about Mali.

Depending on the intended use for this data by the Intel Analysts, this data set may still not be adequately protected; despite masking the last eight characters in this query, it’s still possible for anyone with access rights to re-identify some of the longer names. This is where usage purpose comes into play.

As you’ll recall, we added this data source to a Project with a Counter Intelligence purpose. Now you can create a data access control policy that masks data based on the user’s purpose for accessing it. For example, if you’d like to mask records in the Name column using a masking function, and apply it to everyone except those acting under a Counter Intelligence purpose, your policy builder would look like this:

If you’re not a member of the Project, the records in the Name column are fully masked when you run the query:

However, if you’re a member of this Project you’ll see the data with names exposed. Additionally, since Projects equalize the data across collaborators, only the Mali records are shown. This means Analysts A and B can both access and collaborate on the same data set without risk of unauthorized or inadvertent exposure to sensitive data.

To ensure that Projects adhere to their intended purposes and that data access controls are working as intended, administrators can use Immuta’s automated reporting and data audit trails to see changes to policies, who queried what data, when, and what data the query returned.

This is critical in the government, as well as highly regulated industries like financial services, insurance, healthcare, and consumer goods, because it helps prove compliance, investigate anomalies, identify unauthorized access, and prevent data access control failures before they occur.

Immuta natively integrates with Starburst and is therefore deployed on every node within Starburst clusters. This allows Immuta’s attribute- and purpose-based access controls to be enforced consistently across all data sources and databases that Starburst connects to, including NoSQL databases, relational databases, and all compute infrastructures. Below, you can see how the category and subcategory data from the Mali records of your Project might look when connected to Tableau through Starburst.

By separating policy from compute platform and natively integrating with Starburst, Immuta centralizes and dynamically applies access controls across all data sources and data sets within your Starburst environment. When sharing and collaborating on data — particularly highly sensitive data — this integration is critical for accelerating speed to data access, simplifying access control, avoiding data copies and role explosion, and auditing everything in real-time. Now, you can streamline data sharing and rest assured knowing your data is being used efficiently and securely.