The prevalence of AWS in today’s data environments
Amazon Web Services (AWS) is a leading cloud computing platform that comprises a number of services and is trusted by many of the largest and complex organizations in the world. In fact, AWS holds nearly a third of the global cloud infrastructure market, ahead of both Microsoft and Google.
Traditionally, these AWS services focused on general compute workloads, which removed the need for organizations to purchase and manage physical servers. Amazon EC2 was one of the most notable services to deliver this option, but over the years AWS has released many new innovative and elegant services to allow organizations to run their entire data environment on AWS.
As more data teams make this transition, they also need a way to ensure data is provisioned quickly, consistently, and securely across any and all connected AWS services. In this blog, we’ll take a closer look at how Immuta simplifies this objective and ensures that you can maximize the value of your AWS investments.
The breadth of AWS cloud services
At a high level, AWS’s cloud services can be broken down into a number of key areas:
Storage
Amazon S3 allows you to store and retrieve troves of data from anywhere. This includes the storage of any type of object, such as structured data (CSV, JSON, Parquet), unstructured data, log files, Web File, and Images, to name a few. This storage layer is key because it provides the bedrock for all your data, and its importance will continue to grow as AI systems like LLMs access various data formats.
Relational databases
Relational databases connect the dots between collections of data points and organize them into logical tables. AWS provides many relational database services such as RDS, Aurora, and Oracle Database@AWS, to name a few. Integrating these allows for more online transaction processing (OLTP) use cases, making them highly valuable for industries like financial services and e-commerce.
Cloud data warehouses
Cloud data warehouses centralize structured and semi-structured data storage for analysis, offering scalability and affordability. Amazon Redshift is an AWS service that provides a cloud data warehouse for analytical workloads, focusing on OLAP use cases.
Analytic services
As organizations prioritize data-driven decision-making, analytic services are a must-have component of their infrastructure. AWS offers a number of rich analytic services, such as Amazon SageMaker, to allow data scientists and ML engineers to build, train, and deploy ML models at scale.
How Immuta integrates with AWS
Immuta is a pioneer in scalable, enterprise-grade data provisioning solutions. A key architecture pattern is that Immuta works with the native controls of the data platforms that it is securing. This requires building an integration that allows users to easily create policies using Immuta’s no-code policy builder. The integration will automatically convert the policy logic into AWS platform-specific controls and provision access directly in the data platform. This ensures that neither query performance nor the user experience is impacted for the data consumer.
For several years, we’ve worked closely with AWS on providing data access solutions for Amazon Redshift. Today, Immuta protects a number of the world’s largest banks’ and pharmaceutical companies’ Redshift data warehouses.
More recently, Immuta widened its support to cover more AWS services.
- We deliver support for Amazon S3 Access Grants, allowing you to enforce robust governance for structured and unstructured data across your entire data lake. Data users can select their application of choice, such as SageMaker, to consume and use data in S3.
- We announced a further integration for AWS Lake Formation, which extends Immuta’s support to more services such as Amazon Athena, EMR Spark, and Redshift Spectrum. In addition, Immuta has recently released support for PostgresSQL for Amazon Aurora and RDS.
The challenge: Provisioning data across AWS services
As we’ve outlined, AWS provides data services to cover many use cases, ranging from storage of structured/unstructured data, to streaming and analytics. Its breadth of services means you can choose which you adopt and pay for, while users get to use their tools of choice to consume data.
But in today’s environment, this model presents some new complexities:
- Historically, data that was not curated and presented in a simple-to-understand report was only used by DBAs or data/ML engineers. However, with the advancement of AI-assisted copilots, the technical barriers to leveraging raw data have been removed.
- On top of that, AI agents interacting directly with the data is driving an exponential increase in data usage.
This growth in data demand means managing who can get access to data and how to implement controls consistently within each AWS service is far more challenging.
Let’s say a user wants to interact with unstructured data in SageMaker; they could use an STS token to access and modify the data they need. Or, if an analyst wanted to access a table in Athena, they could request access to a subscriber database in Amazon DataZone.
While these examples seem straightforward, both require data access to be granted in two different ways. You would potentially need to replicate the access patterns across several AWS accounts, and configure separate subscriber databases to facilitate access.
In addition, since AWS allows services to be run in isolation within accounts, all controls must be scoped to an account. This requires replicating controls across accounts. For example, if you have 10 AWS accounts, you would need to re-create global data access policies for each of those 10 accounts. The risk in this scenario is that different policy interpretations may be implemented differently inside each account.
The optionality of AWS services means that each service may manage data access differently – a clear challenge for provisioning data at scale. That’s where Immuta comes in.
The solution: Automated data provisioning via Immuta
Immuta provides a single data provisioning layer that sits across all your AWS accounts and services. This allows you to create data access policies (e.g. all data engineers get access to data tagged with Layer Bronze) that can be applied across any AWS service, so that data consumers can access data using their tool of choice (e.g. SageMaker, Python notebook).
Immuta also handles ad hoc or exception-based access requests, which can be time-consuming and resource-intensive. It does this by enabling:
- Data products to be built and published across all AWS services and other third party data platforms, such as Snowflake and Databricks
- Data Requestor to search, find, and request access to data for themselves or on behalf of someone else
- Complex approval workflows and data usage agreements to be created and implemented
- Approvers to review data access requests and grant time-bound access
- AI-assisted access reviews that provide risk ratings for each request to ensure approvers grant access safely
Whether data access is granted automatically based on a pre-defined data access policy or in response to a specific data access request, Immuta’s integrations with AWS automatically provision access without the need to manage how the controls are enforced.
How Immuta provisions data across AWS
Let’s dig in to see how Immuta provisions access to your data products in AWS.
- Connect Immuta to your AWS service(s) and register all data, whether that data is cataloged in AWS Glue or files in an S3 bucket. Immuta will register these as data sources and will keep in sync with AWS as data is added, updated, or removed.
- Immuta automatically pulls in all your users, both human and non-human, and links them to their IAM user/role.
- Create a single data product that combines structured and unstructured data across many services. This does not require moving, copying, or isolating data.
- Publish the data product and allow users to search and request access to it. The data product can be searched and viewed directly via Immuta, as well as through other tools such as Amazon DataZone.
- Approvers review the request with assistance of Immuta Review Assist and make a decision to grant, temporarily grant, or deny access.
- If access is granted, Immuta will interact with the appropriate AWS service/controls to immediately provision access for the data user.
The simplicity of this process comes from the tight integrations Immuta has with various AWS services. These integrations remove the complexity of implementing controls in different AWS services, and you don’t need to move data into an isolated database or create new IAM roles to provision it.
Looking ahead with AWS and Immuta
As AWS delivers greater flexibility in its services and in how you store, query, analyze, request, and share data, Immuta is the best tool to help overcome the data provisioning complexities that hinder many organizations.
Immuta’s native integration architecture with AWS ensures that data users can use and consume data in their AWS service of choice. Immuta’s provisioning capabilities allow non-technical staff – as well as technical stakeholders – to create and publish data products without the need for any data engineering effort or new AWS IAM permissions. As a result, data can be published and used in near-real time, ensuring that AWS investments deliver maximum efficiency and value.
We continue to strengthen our partnership through initiatives like the AWS FSI Competency, which recognizes our expertise in helping banks, insurers, capital markets, and fintechs securely unlock the value of their data in the cloud.
Read more.
Learn about Immuta's integrations with AWS services.