​How to Build a Governed RAG Agent in Snowflake

Introduction

Most “AI on your data” demos quietly run the agent under a single service role, which means every user of the agent inherits the same view of the data, and row-level and masking policies go out the window. In this blog I want to show how you can keep fine-grained access control intact through a managed agent runtime, in this case Snowflake’s Cortex Agent, without falling back to a single service role. The pattern has three parts:

  • A custom vector table instead of Cortex Search, so row-level policies apply at the data layer itself.
  • Immuta on-behalf-of (OBO), so each user session gets a temporary Snowflake role that reflects that user’s permissions, not the agent’s.
  • A role-restricted programmatic access token (PAT): the bridge credential that lets the Cortex Agent execute under the vended role.

The problem: RAG is easy until authorization matters

It is easy to build an AI agent that can search documents. Give the agent access to a document index, let it retrieve the top matching chunks, and ask an LLM to write an answer.

That is not enough for enterprise data.

In a real organization, the question is not only:

Can the agent find the right document?

The harder question is:

Is the agent allowed to show this document to this specific human?

That distinction matters a lot for RAG. A retrieval tool is effectively a data access path. If the tool can retrieve every document, then the agent can leak every document, even if the user behind the chat experience should only see a subset.

RAG & Cortex Search – in a nutshell

If you already build RAG systems, skim this section. If you don’t: retrieval-augmented generation is the trick that lets a language model answer questions about documents it has never seen during training. There are three moving parts.

1. Parse documents into clean text

LLMs read text, not PDFs. A scanned file, an email exported as pdf, a picture of a document: none of these are directly readable. Before anything else happens, every document has to be normalized into Markdown.
In Snowflake that’s one function call:

SELECT AI_PARSE_DOCUMENT(
  TO_FILE('@PDF_STAGE', 'HARBORLIGHT_2026_02_loss_run_report.pdf'),
  OBJECT_CONSTRUCT('mode', 'LAYOUT', 'page_split', TRUE),
  TRUE
);
-- returns: { pages: [ { index: 0, content: "# Loss Run Report ..." }, ... ] }

AI_PARSE_DOCUMENT handles extraction, OCR for scanned pages, and layout reconstruction, so a multi-column PDF comes out the other end as well-ordered Markdown with headings, paragraphs, and tables intact. This is the same parser Cortex Search uses under the hood; you’re just calling it yourself so you can decide what happens next.

2. Embed every chunk

Once you have text, you turn each chunk into a fixed-length vector: a list of floating-point numbers, in our case. The crucial property: two pieces of text that mean the same thing produce vectors that point in almost the same direction. We measure that with cosine similarity.

That’s the entire mathematical claim behind semantic search. Embed the text you want to make searchable. In Snowflake you can use the AI_EMBED function. Then embed each new question on the fly, sort by cosine similarity, return the top K results.

3. Let an agent call retrieval as a tool

The last piece is what makes this agentic: the LLM doesn’t pre-load every document. It receives the user’s question, decides retrieval is needed, calls a search function as if it were any other tool, search_docs(...), reads the returned chunks, and writes the final answer grounded in them.

That’s RAG. If you’ve seen Cortex Search, you’ve seen a managed version of all three steps glued together behind one CREATE CORTEX SEARCH SERVICE.

The governance gap in Cortex Search

Cortex Search is a perfectly good retrieval service. It does parsing, chunking, embedding, and indexing in one declarative object. For prototypes, I reach for it first. But it has one property that disqualifies it from the use case I care about:

Access to a Cortex Search service is governed by USAGE on the service, not by row-level policies on the underlying data.

If you can call the service at all, you can retrieve any chunk it indexed. There is no way to say “this user should only see non-sensitive data” without splitting the corpus into multiple physical services and granting USAGE per persona. That isn’t fine-grained access control.

Now let’s take an insurance case as an example. We want to create an agent that can search insurance claim documents, such as broker correspondence, statements of value, FNOL forms, and loss-run reports with paid claim amounts. The same agent should be callable by a customer service employee and a financial analyst, but each role should only see the documents it is allowed to see.

Option A · Managed

Cortex Search

  • Fastest path to a working RAG prototype: one CREATE statement.
  • No vector or chunking code to maintain.
  • Parsing handles common formats out of the box.
  • Governance limitation: access is controlled by USAGE on the service, not row-level policies.
  • Fixed extraction pipeline: no VLM, no custom OCR, no special cases.
  • Less control: chunk boundaries, embedding model, and metadata columns are managed for you.

Option B · Build it yourself

Custom vector table + UDF

  • Governance-aware metadata: add sensitivity, financial, owner, or document-type columns that drive row-level policy.
  • Immuta policies apply automatically: the LLM only sees authorized rows.
  • Choose your extractor per document type: VLM, custom OCR, layout-aware parsing, or Snowflake-native parsing.
  • Decide what gets embedded: full pages, chunks, titles, summaries, or a mix.
  • Tradeoff: more upfront engineering. You build it, and you maintain it.

To summarise, move to your own vector table the moment row-level access, custom extraction, or rich metadata starts to matter.

Insurance example in Snowflake

In the following section I will be showing an example on how you can setup your own search tool and RAG agent in Snowflake. The source documents are a synthetic insurance corpus with fictional businesses, policies, and claims.

The documents are example insurance files such as:

  • broker submission emails
  • loss run reports
  • statements of values
  • policy schedules
  • certificates of currency
  • insurance applications
  • FNOL forms
  • claim reports

Insurance is a useful domain for explaining governed RAG because not every employee should see every document. Some files contain claims history, personal data, or financial information. Other files are routine email correspondence.

The final vector table includes both text and metadata:

Column Purpose
relative_path Link back to the staged PDF
document_type Filter by business document type
insured_business_name Business context for answers
insured_industry Additional business context
insurer_name Carrier or underwriter context
sensitivity_classification Human-readable governance label
contains_financial_data Policy input
contains_claim_data Policy input
contains_person_name Policy input
page_markdown Parsed page text
page_embedding Vector used for semantic retrieval

Example rows:

relative_path document_type insured_business_name financial? sensitivity
pdfs/HARBORLIGHT_2026_01_broker_submission_email.pdf broker_submission_email Harborlight Facilities Group Pty Ltd FALSE INTERNAL
pdfs/HARBORLIGHT_2026_02_loss_run_report.pdf loss_run_report Harborlight Facilities Group Pty Ltd TRUE RESTRICTED_CLAIMS
pdfs/HARBORLIGHT_2026_04_policy_schedule.pdf policy_schedule Harborlight Facilities Group Pty Ltd TRUE CONFIDENTIAL_UNDERWRITING

That metadata is what makes row-level filtering possible. Using those documents, we can set up a pipeline in Snowflake that transforms the documents into a usable vector table with metadata that allows us to set row-level filter policies on it. The full flow then would look as follows:

Five named objects, three of them (dynamic) tables, one of them governed. Let’s walk through each.

Stage and manifest

Documents go into an internal stage. A small manifest table gives every file a deterministic document_id derived from its relative path. That MD5(relative_path) trick is what keeps the rest of the pipeline idempotent.

CREATE OR REPLACE STAGE <Database>.<Schema>.PDF_STAGE
  DIRECTORY = (ENABLE = TRUE)
  ENCRYPTION = (TYPE = 'SNOWFLAKE_SSE');

CREATE OR REPLACE TABLE <Database>.<Schema>.STAGED_PDFS AS
SELECT
  MD5(RELATIVE_PATH) AS document_id,
  RELATIVE_PATH                AS relative_path,
  FILE_URL                     AS stage_file_url
FROM DIRECTORY(@<Database>.<Schema>.PDF_STAGE)
WHERE LOWER(RELATIVE_PATH) LIKE '%.pdf';

What that table looks like for our insurance corpus:

document_id relative_path stage_file_url
7d3e94a1… pdfs/HARBORLIGHT_2026_01_broker_submission_email.pdf sfc://stage/PDF_STAGE/…
e5f6b3c2… pdfs/HARBORLIGHT_2026_02_loss_run_report.pdf sfc://stage/PDF_STAGE/…
a1b2c3d4… pdfs/HARBORLIGHT_2026_03_statement_of_values.pdf sfc://stage/PDF_STAGE/…
9f8e7d6c… pdfs/NORTHSTAR_2026_01_broker_submission_email.pdf sfc://stage/PDF_STAGE/…
4c5d6e7f… pdfs/NORTHSTAR_2026_04_fnol_form.pdf sfc://stage/PDF_STAGE/…

Metadata is what makes governance possible

Each document contains metadata that either is derived from the source or could be classified/extracted using AI/ML. In my example I will be using a METADATA table with derived boolean flags that classify each document for governance purposes. Those flags are what row-level policies later filter on.

CREATE OR REPLACE TABLE <Database>.<Schema>.METADATA AS
SELECT
  pdf_filename,
  document_type,
  insured_business_name,
  insured_industry,
  insurer_name,
  -- the governance flags --
  IFF(document_type IN ('loss_run_report', 'fnol_form', 'claim_report'),
      TRUE, FALSE) AS contains_claim_data,
  IFF(COALESCE(annual_revenue, premium, total_declared_value,
               total_incurred, total_paid) IS NOT NULL,
      TRUE, FALSE) AS contains_financial_data,
  CASE
    WHEN document_type IN ('loss_run_report', 'fnol_form') THEN 'RESTRICTED_CLAIMS'
    WHEN COALESCE(annual_revenue, premium, …) IS NOT NULL THEN 'CONFIDENTIAL_UNDERWRITING'
    ELSE 'INTERNAL_SYNTHETIC'
  END AS sensitivity_classification
FROM <Database>.<Schema>.RAW_METADATA;

And what comes out:

pdf_filename document_type insured_business_name fin? claim? sensitivity
HARBORLIGHT_2026_01_broker_submission_email.pdf broker_submission_email Harborlight Facilities Group FALSE FALSE INTERNAL
HARBORLIGHT_2026_02_loss_run_report.pdf loss_run_report Harborlight Facilities Group TRUE TRUE RESTRICTED_CLAIMS
HARBORLIGHT_2026_03_statement_of_values.pdf statement_of_values Harborlight Facilities Group TRUE FALSE CONFIDENTIAL_UNDERWRITING
NORTHSTAR_2026_01_broker_submission_email.pdf broker_submission_email Northstar Recreation Pty Ltd FALSE FALSE INTERNAL
NORTHSTAR_2026_04_fnol_form.pdf fnol_form Northstar Recreation Pty Ltd TRUE TRUE RESTRICTED_CLAIMS

Notice the structure: every row carries its own policy hints. contains_financial_data is the column we’ll use later to keep loss runs out of the customer service employee’s results.

Parse, embed, and join in two dynamic tables

The first dynamic table parses each PDF into page-level Markdown. The second embeds every page and joins in the metadata. I deliberately kept it page-level instead of chunk-level: it’s easier to explain, it maps cleanly to AI_PARSE_DOCUMENT‘s output, and for short documents one page is the right retrieval unit. Smarter chunking is a production concern, not a tutorial concern.

CREATE OR REPLACE DYNAMIC TABLE <Database>.<Schema>.DT_VECTOR_TABLE
  TARGET_LAG = '1 day'
  WAREHOUSE = <Warehouse>
  REFRESH_MODE = FULL
AS
SELECT
  MD5(CONCAT(p.document_id, ':', p.page_index::STRING)) AS vector_row_id,
  p.document_id,
  p.relative_path,
  md.document_type,
  md.insured_business_name,
  md.insured_industry,
  md.insurer_name,
  md.sensitivity_classification,
  md.contains_claim_data,
  md.contains_financial_data,
  md.contains_person_name,
  p.page_index,
  p.page_markdown,
  AI_EMBED('snowflake-arctic-embed-l-v2.0', p.page_markdown) AS page_embedding
FROM  <Database>.<Schema>.DT_PARSED_DOCUMENT p
LEFT JOIN WIM_DEMO.INSURANCE_DEMO.METADATA md
  ON SPLIT_PART(p.relative_path, '/', -1)
     IN (md.pdf_filename, md.scanned_pdf_filename);

This is the table everything else hangs off. One row per PDF page, original Markdown alongside its embedding, every governance flag inherited from METADATA. This is also the table Immuta will attach row-level policies to.

vector_row_id document_type insured_business_name page page_markdown preview page_embedding fin? claim?
a3f9… broker_submission_email Harborlight Facilities Group 0 # Renewal submission · Harborlight Facilities Group Pty Ltd… [0.124, -0.087, 0.412, …] FALSE FALSE
b7c1… loss_run_report Harborlight Facilities Group 0 # Loss Run Report · Period: 2020-2025 · 4 incidents… [0.211, 0.103, -0.067, …] TRUE TRUE
b7c2… loss_run_report Harborlight Facilities Group 1 | Date | Description | Paid | Incurred | … 2023-02-28 … [0.198, 0.142, -0.051, …] TRUE TRUE
c4d8… statement_of_values Harborlight Facilities Group 0 # Statement of Values · Total declared value: USD 3,615,000… [0.156, -0.034, 0.289, …] TRUE FALSE
d2e5… broker_submission_email Northstar Recreation Pty Ltd 0 # Target inception 02/12/2025 · Atlas Risk Advisors to Meridian Mutual… [0.087, -0.134, 0.376, …] FALSE FALSE
e9f1… fnol_form Northstar Recreation Pty Ltd 0 # First Notice of Loss · Reporter: M. Vale · Estimated loss: USD 26,750… [0.223, 0.094, -0.018, …] TRUE TRUE

Why this matters

Because the embeddings, the page text, and the governance flags all live on the same row, any row-level policy expressed on contains_financial_data or sensitivity_classification automatically filters retrieval. The agent has no special path around the policy; it can only retrieve rows that SELECT * would return for the current session.

Retrieval, as a UDF

With the vector table in place, retrieval is only one SQL query: embed the user’s question, score every row by cosine similarity, sort, take the top K. I wrote it as a table function so it can be used as a tool for my agent. I even added a variable DOCUMENT_TYPE_FILTER that allows my agent to filter on specific document types, making the search capability more complex

CREATE FUNCTION <Database>.<Schema>.SEARCH_INSURANCE_DOCUMENTS(
  "QUERY" VARCHAR,
  "TOP_K" NUMBER,
  "DOCUMENT_TYPE_FILTER" VARCHAR
)
RETURNS TABLE (
  relative_path VARCHAR,
  document_type VARCHAR,
  insured_business_name VARCHAR,
  insured_industry VARCHAR,
  insurer_name VARCHAR,
  page_index NUMBER,
  similarity FLOAT,
  page_markdown VARCHAR
)
AS
$$
  WITH scored AS (
    SELECT
      relative_path,
      document_type,
      insured_business_name,
      insured_industry,
      insurer_name,
      page_index,
      VECTOR_COSINE_SIMILARITY(
        page_embedding,
        AI_EMBED('snowflake-arctic-embed-l-v2.0', "QUERY")
      ) AS similarity,
      page_markdown
    FROM <Database>.<Schema>.DT_VECTOR_TABLE
    WHERE UPPER("DOCUMENT_TYPE_FILTER") = 'ALL'
       OR document_type = "DOCUMENT_TYPE_FILTER"
  )
  SELECT
    relative_path,
    document_type,
    insured_business_name,
    insured_industry,
    insurer_name,
    page_index,
    similarity,
    page_markdown
  FROM scored
  QUALIFY ROW_NUMBER() OVER (ORDER BY similarity DESC) <= "TOP_K"
  ORDER BY similarity DESC
$$;

The FROM DT_VECTOR_TABLE clause runs under the current session’s role, so whatever row-level policy Immuta has attached to that table fires before the cosine similarity is even computed. The agent doesn’t “rank then redact.” It never sees the redacted rows in the first place.

Calling it looks like this:

SQL · calling the UDF

SELECT *
FROM TABLE(WIM_DEMO.INSURANCE_DEMO.SEARCH_INSURANCE_DOCUMENTS(
  QUERY               => 'claims history for Harborlight Facilities Group',
  TOP_K               => 5,
  DOCUMENT_TYPE_FILTER => 'ALL'
));

And the results, ranked by similarity:

Result · search ranked by cosine similarity (executed as a privileged role, no policy applied)

# relative_path document_type insured_business page similarity
1 pdfs/HARBORLIGHT_2026_02_loss_run_report.pdf loss_run_report Harborlight Facilities Group 0 0.781
2 pdfs/HARBORLIGHT_2026_02_loss_run_report.pdf loss_run_report Harborlight Facilities Group 1 0.762
3 pdfs/HARBORLIGHT_2026_01_broker_submission_email.pdf broker_submission_email Harborlight Facilities Group 0 0.428
4 pdfs/HARBORLIGHT_2026_03_statement_of_values.pdf statement_of_values Harborlight Facilities Group 0 0.366
5 pdfs/NORTHSTAR_2026_04_fnol_form.pdf fnol_form Northstar Recreation Pty Ltd 0 0.241

The two loss-run pages dominate the ranking. That is exactly what you’d want for that question, and exactly what you don’t want a customer service employee to see. Hold that thought.

Creating the Cortex Agent

The Cortex Agent stays thin: it decides when to search, calls Search_insurance_docs, and writes the answer from the returned document pages. The retrieval logic stays in Snowflake SQL.

The agent tool has three required inputs. When no document filter is needed, the agent passes ALL.

{
  "query": "email conversations about claims and renewals",
  "top_k": 10,
  "document_type_filter": "broker_submission_email"
}

A compact agent specification looks like this:

{
  "models": {
    "orchestration": "auto"
  },
  "instructions": {
    "orchestration": "You are an insurance support agent. Use Search_insurance_docs to retrieve document pages. Only answer from returned tool results."
  },
  "tools": [
    {
      "tool_spec": {
        "type": "generic",
        "name": "Search_insurance_docs",
        "description": "Searches embedded insurance document pages.",
        "input_schema": {
          "type": "object",
          "properties": {
            "query": { "type": "string" },
            "top_k": { "type": "number" },
            "document_type_filter": { "type": "string" }
          },
          "required": ["query", "top_k", "document_type_filter"]
        }
      }
    }
  ],
  "tool_resources": {
    "Search_insurance_docs": {
      "type": "function",
      "identifier": "<Database>.<Schema>.SEARCH_INSURANCE_DOCUMENTS_AGENT",
      "execution_environment": {
        "type": "warehouse",
        "warehouse": "<Warehouse>"
      }
    }
  }
}

Create or replace the agent with that specification:

CREATE OR REPLACE AGENT

The agent receives only the wrapper function as a tool. It does not get direct table access and it does not use Cortex Search. Retrieval always goes through the UDF path reading DT_VECTOR_TABLE, where Immuta policies can filter rows.

The governance problem with agents

Stop and look at what we’ve built. We have a UDF that respects row-level policies. We have an agent tool that wraps that UDF and we have a Cortex Agent that can use that UDF. Good. But how does the agent connect to Snowflake? The default answer is to give the agent a service account, slap a “service-role” role on it, grant that role SELECT on the vector table, and call it a day. The problem with that pattern is straight-line obvious once you see it:

If the agent is always the same role, then every user of the agent gets the same view of the data. Row-level policy is functionally bypassed.

Your customer service employee inherits the financial analyst’s view. Your auditor sees “agent_service_user ran 47,000 queries last week” with no human attached. Every governance investment you made upstream evaporates the moment a model starts driving the queries. The fix is to keep the human as the security anchor and let the agent borrow their context.

That’s exactly what Immuta on-behalf-of is for. The flow:

  1. The user authenticates to the chat application. Their Immuta user-id is now known to the backend.
  2. The backend calls Immuta’s OBO endpoint with the agent’s API key and the human’s user-id, asking for a temporary Snowflake role.
  3. Immuta vends a short-lived Snowflake role that reflects exactly that user’s policy context, including masking and row filtering.
  4. The backend creates a Snowflake PAT for the service user, restricted to the vended role.
  5. The backend calls the Cortex Agent over REST, authenticating with that PAT.
  6. The Cortex Agent executes the search tool as the service user, but under the role-restriction, so every SELECT is evaluated under the vended role and Immuta’s policies fire.

The middle two steps are the interesting bit. Immuta gives you a temporary Snowflake role; Snowflake gives you a way to bind that role to a credential. The role-restricted PAT is the engineering glue between them.

The role-restricted PAT, in detail

Snowflake PATs support ROLE_RESTRICTION, which specifies the role used for privilege evaluation and object creation. Snowflake also notes that secondary roles are not used for a role-restricted PAT:

https://docs.snowflake.com/en/sql-reference/sql/alter-user-add-programmatic-access-token

The PAT creation SQL looks like this:

ALTER USER IF EXISTS <agent_service_user>
  ADD PROGRAMMATIC ACCESS TOKEN <token_name>
  ROLE_RESTRICTION = '<vended_role>'
  DAYS_TO_EXPIRY = 1
  COMMENT = 'Immuta OBO Cortex Agent PAT';

For a service user, Snowflake requires a role restriction by default unless authentication policy settings change that behavior. Snowflake’s PAT documentation also calls out that service users need a network policy to generate or use PATs by default:

https://docs.snowflake.com/en/user-guide/programmatic-access-tokens

That makes the PAT more than just a token. It becomes a short-lived, role-constrained handoff artifact between:

  • Immuta as the policy and OBO system
  • Snowflake as the data platform
  • Cortex Agent as the managed agent runtime

Required Snowflake setup details

These are the practical setup details that mattered.

1. PAT admin privilege

The backend service role needs permission to create PATs for the service user:

CREATE ROLE <pat_admin_role>;

GRANT ROLE <pat_admin_role> TO USER <agent_service_user>;

GRANT MODIFY PROGRAMMATIC AUTHENTICATION METHODS
  ON USER <agent_service_user>
  TO ROLE <pat_admin_role>;

Snowflake documents MODIFY PROGRAMMATIC AUTHENTICATION METHODS as the privilege needed when generating a PAT for another user or service user.

2. Network policy

In order to be able to use the PAT token, we have to set a network policy. For quick validation, a permissive user-level network policy may be enough:

CREATE OR REPLACE NETWORK POLICY <agent_service_user>_ALLOW_ALL
  ALLOWED_IP_LIST = ('0.0.0.0/0')
  COMMENT = 'Allow-all policy for service-user PAT auth';

ALTER USER <agent_service_user>
  SET NETWORK_POLICY = <agent_service_user>_ALLOW_ALL;

An allow-all network policy is not a production recommendation. In production, restrict the allowed network ranges.

3. Default role and warehouse

Cortex Agent execution also depended on the service user having:

  • a default role
  • a default warehouse
  • usage on the Agent, database, schema, and function

Calling the Cortex Agent

This is the step that most “OBO for agents” architectures handwave past. The Cortex Agent :run endpoint is an authenticated Snowflake REST call, so it needs a credential. Immuta gives you a temporary role, not a temporary credential. So you build the credential yourself. The OBO request itself is a single POST:

HTTP · Immuta OBO role vending POST

POST https://<tenant>.hosted.immutacloud.com/bim/agent/obo/roles
Authorization: Bearer <AGENT_ACCESS_TOKEN>
Content-Type: application/json

{
  "userid":    "[email protected]",    // the human
  "technology": ["snowflake"],
  "ttl":       "5m"
}

This POST is triggered by your agent service or backend, not by Cortex Agent itself. A typical sequence is: the human user logs into your chat application, your backend maps that session to the user’s Immuta identity, and just before calling Snowflake it asks Immuta to vend an OBO role for that specific agent-and-user session.

Note the separation: the agent is identified by the Bearer token, which is the API key for the registered Immuta agent. The human is identified by the userid in the payload. Immuta responds with the name of the freshly vended Snowflake role, for example IMMUTA_OBO_CS_EMPLOYEE_a1b2c3. That role is the value you carry into Snowflake as the PAT ROLE_RESTRICTION. The role has already been granted to your agent service user and already represents the policy context Immuta calculated for that human and that agent.

SQL · creating the role-restricted PAT

ALTER USER agent_service_user
  ADD PROGRAMMATIC ACCESS TOKEN alice_session_token
    ROLE_RESTRICTION = 'IMMUTA_OBO_CS_EMPLOYEE_a1b2c3'
    DAYS_TO_EXPIRY   = 1
    COMMENT = 'Immuta OBO session for alice.cs';
-- secret is returned exactly once in the result set; capture it now or never.

That single SQL statement is the heart of the pattern. Three properties make it work:

  • For service users, ROLE_RESTRICTION is required unless Snowflake authentication policy settings change that default. That makes it much harder to accidentally create an unrestricted PAT.
  • Privilege evaluation happens under the restricted role only. Secondary roles are not used, so there is no fallback to a broader role.
  • The role must already have been granted to the service user. The PAT does not elevate privilege; it scopes the service user down to the vended role.

Then call the agent:

HTTP · calling Cortex Agent with the PAT

POST https://<account>.snowflakecomputing.com/api/v2/cortex/agent:run
Authorization: Bearer <pat_secret>
Content-Type: application/json

{
  "agent":    "<Database>.<Schema>.INSURANCE_AGENT",
  "messages": [{ "role": "user", "content": "Summarize Harborlight's claims history." }]
}

The agent runs under agent_service_user, but every SELECT it issues, including the one inside our search UDF, is evaluated as if the session were IMMUTA_OBO_CS_EMPLOYEE_a1b2c3. Immuta’s row-level filter fires before cosine similarity is even computed.

One honest caveat. This is a pragmatic composition, not a “pure” OBO flow. The direct Snowflake connector OBO pattern is simpler and more canonical. The PAT-based pattern earns its keep when you want to keep the managed Cortex Agent runtime, including streaming SSE, tool orchestration, and citations, while still respecting per-user policy.

The Insurance Agent Example

Time to close the loop. Imagine a row-level policy attached to DT_VECTOR_TABLE in Immuta that says, in plain English:

“If the current role is not in the finance group, only return rows where contains_financial_data = FALSE.

That single policy is the whole game. Now, when Bob (a customer service employee) and Ben (a financial analyst) each ask the agent the same question, the OBO flow vends a different role for each, the PAT scopes the agent’s session accordingly, and the UDF returns a different result set per session before the LLM has written a single word.

Bob · Customer Service Employee

Customer Service · restricted

  • Vended role: IMMUTA_OBO_CS_EMPLOYEE_a1b2c3
  • Active policy: Hide rows where contains_financial_data = TRUE
  • Question: “Summarize Harborlight’s claims history.”
  • Found only non financial documents

Agent answer

“Based on the available documentation, here’s a summary of Harborlight Facilities Group Pty Ltd’s claims history:

Harborlight Facilities Group Pty Ltd (a facilities maintenance company based in Geelong, VIC) has 4 reported claims over the last 5 policy years, as noted in their broker submission for renewal effective 01/07/2026.

Unfortunately, I’m unable to access the detailed loss run report or individual claim reports that would provide specifics on claim types, amounts, and outcomes. The submission references these as attachments, but I cannot retrieve them.

In summary, with 4 claims over 5 years for a business that has been operating for 18 years across 3 locations, the claims frequency appears moderate. If you need more detailed information on individual claim amounts or circumstances, I’d recommend requesting the loss run report directly.”

Ben · Financial Analyst

Financial Analyst · authorised

  • Vended role: IMMUTA_OBO_FIN_ANALYST_d4e5f6
  • Active policy: No restriction on financial or claim columns
  • Question: “Summarize Harborlight’s claims history.”
    Found financial documents

Agent answer

“Here’s a short summary of Harborlight Facilities Group Pty Ltd’s claims history:

Harborlight Facilities Group (a facilities maintenance company based in Geelong, VIC) has 4 claims on record spanning 2021 to 2024, with a total incurred cost of USD $142,985:

Claim No. Loss Date Category Status Incurred
HFG-2021-18420 17/11/2021 Water Damage Closed $18,920
HFG-2022-50918 14/06/2022 Equipment Damage Closed $34,775
HFG-2023-73104 28/02/2023 Public Liability Closed $61,430
HFG-2024-26651 19/10/2024 Storm Damage Open $27,860

Key observations:

  1. Three of the four claims are closed with no outstanding reserves.
  2. The largest single loss was a public liability incident in February 2023 (USD $61,430), which was investigated and subsequently closed.
  3. One claim remains open: storm damage from October 2024 (USD $27,860).
  4. The claim frequency is roughly one per year over the four-year period.”

What just happened, mechanically:

Inside the UDF · rows considered for ranking, by session

Vector row Document type Page Financial? Claims? Bob
Customer Service
Ben
Financial Analyst
a3f9… broker_submission_email 0 FALSE FALSE Visible Visible
a3fa… broker_submission_email 1 FALSE FALSE Visible Visible
b7c1… loss_run_report 0 TRUE TRUE Filtered Visible
b7c2… loss_run_report 1 TRUE TRUE Filtered Visible
c4d8… statement_of_values 0 TRUE FALSE Filtered Visible
e9f1… fnol_form 0 TRUE TRUE Filtered Visible

The agent didn’t refuse to tell Bob about the loss run. It never saw it. The cosine similarity computation in his session was performed over three rows, not six. His top-K was bounded by what his role could see; his LLM context was bounded by his top-K; his answer was bounded by his LLM context. Policy enforced at the data layer, propagated all the way to the response, with the managed Cortex Agent runtime sitting in the middle.

What I’d be honest about

Three things I’d want you to take away as caveats.

1 · You don’t always need a PAT token

A PAT is not a requirement of Immuta OBO itself. If you own the agent runtime, you can also connect to Snowflake directly with a Snowflake connector and service-principal or service-user credentials, then make sure the session uses the ephemeral role that Immuta vended for the human user. In that architecture you are not authenticating to Snowflake’s REST API, so you do not need a PAT.

The PAT becomes useful in this demo because I call the Cortex Agent over Snowflake’s REST API. That API needs a bearer credential, and a role-restricted PAT is a clean way to bind that credential to exactly one vended role. In return for the extra setup, you keep the managed Cortex Agent runtime, including streaming responses, tool orchestration, and citations.

2 · It works with any agent

It doesn’t matter whether your agent runs in Cortex, OpenAI, Azure, AWS, or locally. The idea behind this blog was to show how you can build a search tool for your agent that is governed by Immuta. You can give this tool to any agent, as long as the call path preserves Immuta’s OBO context and the SQL executes under the vended role. That is what makes sure the agent only uses the documents it is allowed to see.

3 · Auditability is essential

In some scenarios, we could also pass the user’s credentials directly into our connector and call the SEARCH_INSURANCE_DOCUMENTS function. The problem is auditability: it becomes harder to distinguish data accessed by the user directly from data accessed by an agent acting for that user. With Immuta’s on-behalf-of feature, Immuta can link the vended role back to the human user and the registered agent. That lets us distinguish between the user and the agent acting on behalf of that user. It also lets us define guardrail policies on agents, so the agent itself has a governed access envelope.

The takeaway

The interesting thing isn’t that an AI agent can answer questions over insurance documents. The interesting thing is preserving a governed, user-specific data access context all the way through a managed agent runtime. The exact setup depends on the use case, but this pattern shows one practical way to set up a governed RAG agent in Snowflake.

Source documentation