3 of 37

SESSION ROADMAP

What We're Building Today

The Compliance Crisis

$61B problem — why AI is the only answer

Live Demo: $50K Wire Fraud

Multi-agent debate in action

The Financial AI Trilemma

Privacy vs. Reasoning vs. Auditability

Open Source vs Proprietary

2026 CTO decision framework

The Sovereign Stack

Open-source architecture deep dive

Build vs Buy Calculus

ROI, hardware and team requirements

4 of 37

"In Finance, the model is the engine.

Privacy is the chassis."

Without open source, you're renting intelligence from a competitor.

SOVEREIGNTY

Your data. Your weights. Your logic. Owning your AI stack is your competitive moat.

TRANSPARENCY

Open weights means every decision is reproducible. Full audit trails by design.

ADVERSARIALISM

Agents that challenge each other are more reliable than any single model.

DETERMINISM

Banking cores need schemas. Guardrails AI turns text into certified structured data.

PRINCIPLES

5 of 37

The Business Case & Strategic Why

6 of 37

THE PROBLEM — 2026 DATA

The $61 Billion Compliance Trap

Financial institutions spend a fortune on manual KYC/AML — and are still getting fined billions.

$61B

Annual AML/KYC cost

(US + Canada)

$72.9M

Average per-firm spend

on KYC/AML ops

$3.8B

Global regulatory fines

in 2025

417%

YoY AML penalty surge

H1 2025 vs H1 2024

99% of US/Canada firms reported rising compliance costs

TD Bank fined $1.3B — largest FinCEN penalty in history

Only 4% of SARs ever receive law enforcement follow-up

AI-driven KYC cuts costs 40–70% and halves false positives

7 of 37

THE ROOT CAUSE

Why Single-Prompt LLMs Fail at Finance

Replacing monolithic prompts with a team of specialized adversarial agents is the key breakthrough.

Single Model — The Old Way

Hallucinates on multi-step financial reasoning

Cannot reliably critique its own logic

No separation of legal vs. quantitative tasks

Probabilistic output — banks need deterministic schemas

Single context — no adversarial validation layer

Agentic Team — The Sovereign Way

Distributes cognitive load across specialist agents

Auditor Agent challenges every finding adversarially

Legal Scholar + Data Quant work in parallel

Multi-agent debate achieves mathematical & legal consensus

Guardrails AI enforces deterministic schema on output

Result: Verified, consensus-driven output — with a full audit trail of every agent's reasoning step.

8 of 37

CORE CHALLENGE

The Financial AI Trilemma

You can't pick two. The Sovereign Stack delivers all three simultaneously.

Agentic

Reasoning

Multi-step logic. Dynamic financial analysis.

Impossible with single-prompt LLMs.

Absolute

Privacy

Zero cloud exposure. Air-gapped on-premise.

Transaction data never leaves your data center.

Audit-Grade

Output

Deterministic schemas. Immutable audit logs.

Banking-core-compatible structured JSON.

The unsolved

intersection

Until now.

The Mandate: Bring autonomous, reasoning-capable AI entirely within the private data center — without sacrificing speed or deterministic reliability.

9 of 37

The

Sovereign

Quant

Building Private Agentic Workflows with Open Source LLMs

Data Sovereignty

Agentic Reasoning

Audit-Grade Output

CONFIDENTIAL · FOR CONFERENCE DISTRIBUTION ONLY

10 of 37

The Architecture

11 of 37

CONFIDENTIAL · FOR CONFERENCE DISTRIBUTION ONLY

12 of 37

AGENT ARCHITECTURE

Meet Your Compliance Team

Four specialist AI agents that debate, challenge, and validate until consensus is reached.

Controller Agent

THE SUPERVISOR

Central router. Ingests the initial prompt, delegates sub-tasks, and synthesises the final consensus output.

LangGraph / AutoGen

Legal Scholar Agent

THE REGULATOR

Grounded in Qdrant vector DB. Cross-references SEC filings, FINRA Rule 3310, KYC policy docs, live OFAC sanctions.

BM25 Hybrid Search + RAG

Data Quant Agent

THE MATHEMATICIAN

Parses transaction logs. Recalculates margins. Identifies statistical anomalies across 128K+ token contexts.

Llama 4 Scout (10M ctx)

Auditor Agent

THE ADVERSARY

The internal skeptic. Aggressively challenges all agent findings. Triggers retry loops when logical flaws found.

Multi-Agent Debate Protocol

MULTI-AGENT DEBATE & REASONING LOOP

13 of 37

ZERO-TRUST PRIVACY

The Filtration Funnel: PII Never Reaches the LLM

By architecture, not policy — raw transaction data is anonymized before any agent ever sees it.

UNTRUSTED

Raw Data In

Transaction logs with full PII: names, account numbers, SSNs, beneficiary data, addresses.

FILTERING

Local Presidio

(NER/Regex)

Runs 100% offline. Auto-identifies and redacts PII. No network calls. Zero external exposure.

TRUSTED

Anonymized Tokens

to Agents

Agents reason over anonymized structures only. Zero memory-leakage risk across all debate loops.

Raw transaction data never reaches any language model — by architecture, not policy

Local Presidio operates entirely offline as NER/Regex mesh — zero network calls

Multi-agent loop reasons only over anonymized tokens — no memory leakage possible

DeepSeek & Llama run air-gapped — models never phone home to any external API

14 of 37

LIVE DEMO SCENARIO

Demo: The Suspicious $50K Wire Transfer

Walking through a real compliance check — step by step, agent by agent.

TRIGGER: $50,000 wire transfer flagged. Sender: unknown shell company. Destination: offshore jurisdiction.

Controller Agent →

Request Ingested

Controller decomposes the task and routes parallel queries to all sub-agents simultaneously.

Data Quant Agent →

Mathematical Verification

Parses logs. Finds pattern: 3 transfers in 48h, each just below $10K structuring threshold.

Legal Scholar →

Entity & Sanctions Check

Searches Qdrant + OFAC + adverse media. Result: entity flagged on internal watchlist.

Controller Agent →

Draft Report Submitted

Synthesizes findings. Drafts "Proceed with Enhanced Due Diligence." Submits to Auditor.

Auditor Agent →

REJECTION — Retry Forced

Auditor: Legal Scholar watchlist hit was underweighted. Forces debate loop iteration #2.

All Agents →

Consensus Reached

Final: TRADE BLOCKED – Compliance Violation. Schema validated. Audit JSON delivered to core.

15 of 37

CONFIDENTIAL · FOR CONFERENCE DISTRIBUTION ONLY

16 of 37

SYSTEM OUTPUT

The Final Payload: Dual-Output Architecture

Every compliance run produces two artefacts — one for machines, one for auditors.

Structured JSON for Banking Core

{

"transaction_id": "TXN-987654321",

"status": "BLOCKED",

"risk_score": 0.94,

"compliance_checks": {

"aml_screening": "FAIL",

"kyc_verification": "PASS",

"sanctions_list": "MATCH",

"agent_consensus": true,

"debate_iterations": 2

"audit_trail": {

"decision_ts": "2026-03-31T14:22:01Z",

"proof_id": "PROOF-A1B2C3D4E5"

}

✅ APPROVED — AUDIT READY

Audit-Grade Risk Report

Transaction Assessment

TXN-987654321 HIGH RISK. Multi-agent adversarial debate: 2 iterations. Final consensus: BLOCKED.

Compliance Checks

AML screening FAIL. KYC verification PASS. OFAC sanctions MATCH found. FINRA Rule 3310 reviewed.

Mathematical Proofs

Theorem 1.1: ∀t ∈ T, V(t) → {Approved, Rejected, Flagged}. Agent consensus verified via GRPO.

Debate Log

Loop #1 rejected by Auditor: Legal Scholar watchlist hit underweighted. Loop #2 consensus reached.

17 of 37

REAL-WORLD EVIDENCE

Case Studies: Agentic AI Cutting Compliance Costs

Early adopters of agent-based AML/KYC architectures are already seeing measurable results.

Tier-1 US Investment Bank

Problem: 16 million AML alerts per year. Only 4% warranted follow-up. Analysts spending 60% of time on false positives.

Solution: Multi-agent triage: Quant agent pre-scores transactions. Legal Scholar cross-checks against watchlists. Auditor approves or escalates.

Result: False positive rate reduced by 52%. Analyst capacity freed by 40%. Annual compliance opex down $18M.

LangGraph · DeepSeek R1 70B · Qdrant · Guardrails AI

European RegTech Consortium

Problem: 3 banks sharing AML intelligence. Cannot share raw customer data due to GDPR. Needed collaborative detection without data pooling.

Solution: Federated learning via Flower.ai. Each bank trains locally. Gradient averaging produces a shared detection model.

Result: Cross-institution fraud detection improved 38%. Zero customer data shared. GDPR compliance maintained throughout.

Flower.ai · Federated Learning · Differential Privacy ε=8

Singapore Digital Bank

Problem: 50,000+ scam cases in 2025. Rapid fund movements outpaced traditional rules-based detection tools.

Solution: Real-time agentic pipeline: transaction arrives → PII stripped via Presidio → Quant + Legal agents analyse in parallel → output in <2s.

Result: Detection latency reduced from 4 hours to 1.8 seconds. Scam interception rate improved by 61%.

Local Presidio · vLLM · TensorRT-LLM · Redis

Sources: Fenergo 2025, Napier AI 2025, Silenteight Q4 2025, LexisNexis Global Financial Crime Compliance 2024

18 of 37

Open Source vs. Proprietary: The CTO View

The capability gap has closed. The deployment trade-offs have not.

Capability

Data Privacy

✅ Full On-Prem / VPC

Fine-Tuning

✅ Deep — LoRA, full param

Cost at Scale

✅ Low OpEx after GPU invest

Model Transparency

✅ Full weights & logic visible

Capability

Reasoning (MATH-500)

✅ 97.3% DeepSeek R1 (MIT)

Context Window

✅ 10M tokens (Llama 4 Scout)

Open Source

(DeepSeek-V3 / Llama 4)

Vendor Lock-in

✅ None — weights are yours

"In Finance, the model is the engine — but Privacy is the chassis. Without open source, you're renting intelligence from a competitor."

Data Privacy

✅ Full On-Prem / VPC

⚠️ Trust agreements only

Fine-Tuning

✅ Deep — LoRA, full param

❌ API-based only, limited

Cost at Scale

✅ Low OpEx after GPU invest

❌ High & unpredictable

Proprietary

(GPT-5 / Claude Opus 4.6)

Model Transparency

✅ Full weights & logic visible

❌ Black box — no auditability

Reasoning (MATH-500)

✅ 97.3% DeepSeek R1 (MIT)

✅ ~98% GPT-5 / o3

Context Window

✅ 10M tokens (Llama 4 Scout)

✅ 1M tokens (GPT-5)

Vendor Lock-in

✅ None — weights are yours

❌ Full API dependency

Regulatory Audit Trail

✅ Fully reproducible runs

⚠️ Limited — provider logs only

Inference Cost /1M tk

✅ $0.14 (DeepSeek API)

❌ $2.50–$15.00 (GPT-5/Claude)

"In Finance, the model is the engine — but Privacy is the chassis. Without open source, you're renting intelligence from a competitor."

19 of 37

IMPLEMENTATION GUIDE

Your 90-Day Sovereign Stack Roadmap

Three phases. One air-gapped, production-ready compliance engine.

PHASE 1

Days 1–30

Foundation

Provision GPU cluster — 8× A100 80GB or H100
Deploy vLLM + DeepSeek R1 or Llama 4 Scout
Stand up Qdrant vector DB + Apache Hop ingestion
Deploy Local Presidio PII redaction layer offline

MILESTONE

LLM answering domain queries fully on-premises

PHASE 2

Days 31–60

Orchestration

Implement LangGraph multi-agent framework
Build Controller + Legal Scholar + Data Quant agents
Wire Auditor adversarial debate loop with retry
Deploy Kong/Tyk API gateway + rate limiting

MILESTONE

First end-to-end compliance check runs locally

PHASE 3

Days 61–90

Harden & Audit

Deploy Guardrails AI + Pydantic schema validation
Connect output JSON endpoint to banking core
Run red-team adversarial input exercise
Complete regulatory audit documentation package

MILESTONE

Production-ready, audit-grade output certified

Stack: vLLM · LangGraph · Qdrant · Local Presidio · Guardrails AI · Kong · DeepSeek R1 / Llama 4 Scout · Redis · Apache Hop

20 of 37

BUSINESS CASE

Build vs. Buy: The ROI Math

AI-driven compliance can cut costs 40–70%. The numbers justify the GPU investment.

40–70%

Cost reduction

via AI-driven KYC

50%

Reduction in

false positive alerts

$23.4B

US savings potential

from AI compliance

8× H100

GPU needed for

DeepSeek R1 full

21 of 37

OPEN SOURCE MODEL LANDSCAPE

2026 State of Open Models for Finance

The capability gap with proprietary models has largely closed. The cost gap has not.

DeepSeek R1

MIT ✅

671B MoE · 37B active

▸ 97.3% MATH-500 — highest open model score

▸ Chain-of-thought reasoning transparency

Llama 4 Scout

Llama 4 Community

109B total · 17B active

▸ 10M token context — ingest entire 10-K in one shot

▸ Single H100 with quantization

DeepSeek V3.2

MIT ✅

671B MoE · MIT license

▸ GPT-5-level reasoning on coding + math

▸ Integrated thinking in tool-use workflows

💡 10× cost drop/year: GPT-4 equiv. cost $20/1M tokens in 2022 vs $0.14 today with open models.

22 of 37

INFRASTRUCTURE

Deploying methods: vLLM vs TGI vs Ollama

Feature	🚀 vLLM	🛠️ TGI	📦 Ollama
Core Philosophy	High-Throughput Engine	K8s Model Serving	Localhost Simplicity
Best For	Multi-agent parallel chatter	Diverse architectures & HF-native tooling	Local dev, prototyping, & setup speed
Optimization	PagedAttention & Dynamic Batching	Quantized weights & streaming output	Auto-downloading & simple CLI
Hardware	Multi-GPU / Enterprise (A100/H100)	Native Docker & K8s scaling	Mac, Windows, Linux (Consumer & Pro)
API	OpenAI-compatible	Deep HF ecosystem integration	Simple local REST endpoints

Strategic Guidance:

Choose vLLM when concurrency and cost-per-token are your primary production metrics.

Choose TGI when you need maximum flexibility with Hugging Face models in a Kubernetes environment.

Choose Ollama when privacy and setup speed beat raw throughput for early-stage testing.

23 of 37

COMPUTE REALITY

The Inference Imperative: Agentic Is Compute-Hungry

A single compliance query triggers dozens of parallel LLM calls. Hardware must keep up.

Runtime Comparison Matrix

Metric

vLLM

TensorRT-LLM

Best For

Multi-agent parallel chatter

Time-sensitive quant queries

Optimization

Dynamic continuous batching

Max latency, deep HW integration

GPU Support

Multi-GPU, any CUDA

NVIDIA-specific, precision kernels

Throughput

High — variable request sizes

Maximum — fixed precision kernels

Memory

Redis KV shared state

Shared Memory Layer: Redis (Short-Term State)

Both runtimes connect to Redis for sub-millisecond state retrieval — agents remember debate context without constant prompt reloading.

Hardware Tiers

Full (DeepSeek R1 671B)

8× A100 80GB or H100

~$20K/mo

Mid (Llama 4 Maverick)

4× H100 or A100 40GB

~$8K/mo

Entry (Llama 4 Scout)

1× H100 + quantization

~$3.5K/mo

Edge (R1 70B distil)

2× A100 or 4× RTX 4090

~$1.5K/mo

Challenge: Sub-second latency for multi-agent chatter — entirely within local hardware constraints.

24 of 37

INFRASTRUCTURE

Deploying DeepSeek R1 with vLLM: From Zero to Serving

Production-ready LLM serving on-premises in under 30 minutes.

● ● ●

# 1. Pull the vLLM serving image

docker pull vllm/vllm-openai:latest

# 2. Serve DeepSeek R1 (70B distil, 2× A100)

docker run --gpus all \

-p 8000:8000 \

-v /models:/models \

vllm/vllm-openai:latest \

--model deepseek-ai/DeepSeek-R1-Distill-Llama-70B \

--quantization awq \

--max-model-len 131072 \

--tensor-parallel-size 2

● ● ●

# 3. Test endpoint (OpenAI-compatible)

curl http://localhost:8000/v1/chat/completions \

-H "Content-Type: application/json" \

-d '{"model":"DeepSeek-R1-Distill-Llama-70B",

"messages":[{"role":"user","content":"Analyse TXN-001"}],

"max_tokens":2048}'

AWQ Quantization

4-bit quantization via AutoAWQ reduces VRAM from 140GB → 35GB with <2% accuracy loss on financial tasks.

Continuous Batching

vLLM's PagedAttention dynamically batches concurrent agent requests. Critical for multi-agent parallel chatter.

Redis KV Cache

Share the KV cache across agents via Redis. Agents referencing the same document don't re-tokenize it.

Air-Gap Networking

Run vLLM with --no-internet flag. Disable CUDA telemetry. Bind to internal interface only (--host 10.x.x.x).

25 of 37

Safety, Evidence & Q&A

26 of 37

PII REDACTION — CODE

Local Presidio: Zero-Leak PII Redaction in Code

The exact code pattern used to anonymize transaction data before any LLM ever sees it.

from presidio_analyzer import AnalyzerEngine

from presidio_anonymizer import AnonymizerEngine

analyzer = AnalyzerEngine() # runs 100% offline

anonymizer = AnonymizerEngine()

def redact_transaction(txn: dict) -> dict:

text = json.dumps(txn)

# Detect PII entities in the raw text

results = analyzer.analyze(

text=text, language="en",

entities=["PERSON", "IBAN", "US_SSN",

"US_BANK_NUMBER", "EMAIL_ADDRESS",

"PHONE_NUMBER", "LOCATION"],

)

# Replace with anonymized tokens

anonymized = anonymizer.anonymize(

text=text, analyzer_results=results

)

return json.loads(anonymized.text)

Before Presidio (RAW)

"sender": "John Doe",

"acct": "123-456-789",

"ssn": "001-01-0001",

"dest": "Cayman Islands"

After Presidio (SAFE)

"sender": "<PERSON_1>",

"acct": "<IBAN_1>",

"ssn": "<US_SSN_1>",

"dest": "<LOCATION_1>"

Anonymisation is reversible by the originating system only — the LLM agents never see the mapping table.

27 of 37

OUTPUT INTEGRITY

Deterministic Validation: From Probabilistic to Precise

Banking cores require deterministic schemas. Guardrails AI + Pydantic bridge the gap.

LLM Debate Output

Language models produce probabilistic natural-language text. Raw output is expressive but schema-free and non-deterministic.

Guardrails AI + Pydantic

Acts as an unyielding validation layer. Outputs are strictly type-checked and enforced into predefined schema contracts.

Deterministic Structured Schema

Every field validated, typed, and signed. Validation failure triggers the Auditor Agent for a self-correction loop.

Sample Validated JSON Output

{ "transaction_id": "TXN-987654321",

"status": "BLOCKED", "risk_score": 0.94,

"compliance_checks": {

"aml_screening": "FAIL", "kyc_verification": "PASS",

"sanctions_list": "MATCH", "agent_consensus": true },

"proof_identifier": "PROOF-A1B2C3D4E5" }

Validation Failure → Auto-Retry

Guardrails rejects payload and triggers Auditor Agent for a self-correction debate loop.

Validation Pass → Banking Core

Dual-payload: structured JSON for legacy mainframe + immutable human-readable risk report.

28 of 37

Q&A PREPARATION

Hard Questions — and the Honest Answers

Prepare for these. A technically sophisticated audience will ask all of them.

DeepSeek is Chinese. Can we trust it in finance?

The model weights are open — inspect every parameter. Run air-gapped; it cannot exfiltrate data. Audit the code on GitHub. The same scrutiny applies to any model.

How do you handle model drift over time?

Periodic fine-tuning on new regulatory text via LoRA. Automated benchmark regression tests on a held-out compliance dataset before any model update goes live.

What if the Auditor Agent itself hallucinates?

Max iteration cap (default 3). If consensus is not reached, the transaction is auto-BLOCKED and escalated to a human reviewer. The system fails safe, not open.

How do you prove this to a regulator?

Every decision has an immutable PROOF-ID + full agent debate transcript. The Pydantic schema is deterministic and reproducible. You can re-run any decision from logs.

Is this cheaper than buying a SaaS compliance tool?

Above ~10M tokens/month: yes. GPU CapEx is high upfront, but OpEx drops dramatically. Fenergo/Napier contracts typically run $2–5M/year at enterprise scale.

What about latency? Banks need sub-100ms.

Agent debate is not on the hot path for real-time payments. It runs async for high-value / flagged transactions. Real-time pre-screening uses a smaller, faster 7B model.

29 of 37

RISK MANAGEMENT

Key Risks — and How the Stack Mitigates Them

Every architecture choice addresses a specific failure mode in production finance AI.

RISK: LLM Hallucination

MIT: Multi-Agent Debate Protocol

Auditor Agent adversarially challenges every output. No consensus = no output.

RISK: PII Data Leakage

MIT: Local Presidio NER/Regex Layer

Raw transaction data anonymized BEFORE any agent sees it. 100% offline.

RISK: Non-Deterministic Output

MIT: Guardrails AI + Pydantic

Every output type-checked against predefined contracts. Failures trigger self-correction.

RISK: External API Dependency

MIT: Air-Gapped VPC Architecture

All models run locally. No external calls. DeepSeek and Llama never phone home.

RISK: GPU Latency at Scale

MIT: vLLM + TensorRT-LLM Runtime

Dynamic continuous batching handles erratic multi-agent chatter. Sub-second latency.

RISK: Regulatory Audit Failure

MIT: Immutable Proof Identifiers

Every decision carries a cryptographic proof ID + full agent debate log.

30 of 37

AUDIENCE FIT

Who Should Build the Sovereign Stack?

This is for organisations where data privacy is non-negotiable and intelligence is strategic.

Build Now — You Qualify If:

You handle regulated financial data (KYC/AML/SEC)

Transaction logs CANNOT touch public APIs like OpenAI

You need a full audit trail for every AI decision

Compliance costs exceed $10M/year and still rising

You have or can provision 8+ enterprise GPUs

Your ML team has LLM deployment experience

Start Smaller If:

GPU budget <$200K — start with quantised 70B via Ollama

Team <3 ML engineers — use managed vLLM on private cloud

Use case is non-regulated — proprietary APIs may suffice

No compliance mandate yet — build a proof-of-concept first

Data privacy needs are met by contractual agreements alone

Path to the full sovereign stack remains open as you scale

31 of 37

ADVANCED PRIVACY LAYER

Beyond Air-Gapping: Federated Learning & Differential Privacy

For multi-institution consortia: collaborative intelligence without sharing raw data.

Federated Learning (Flower.ai)

Each bank trains on its own data. Only model gradients — not raw transactions — are shared.

Local Model Training

Each institution fine-tunes the base LLM on its own transaction data — never shared.

Gradient Aggregation

Only model weight deltas are sent to a secure aggregation server (Flower.ai).

Federated Averaging

Global model is updated by averaging gradients. No raw data ever leaves any institution.

Improved Global Model

All participants benefit from collective intelligence without sacrificing data sovereignty.

Differential Privacy (DP Noise)

Mathematical guarantee: even if an attacker sees the model, they cannot infer any single customer's data.

from opacus import PrivacyEngine

# Attach DP to the LLM fine-tuning loop

privacy_engine = PrivacyEngine()

model, optimizer, data_loader = \

privacy_engine.make_private(

module=llm_model,

optimizer=optimizer,

data_loader=train_loader,

noise_multiplier=1.1, # ε ≈ 8

max_grad_norm=1.0,

)

ε (epsilon) = 8 → industry standard for financial models

Smaller ε = stronger privacy guarantee but lower accuracy

32 of 37

TECHNICAL DEEP-DIVE

Building the Debate Loop: LangGraph Walkthrough

How to wire four specialist agents into a cyclic adversarial reasoning graph in ~50 lines.

● ● ●

from langgraph.graph import StateGraph, END

from typing import TypedDict, Annotated

class ComplianceState(TypedDict):

transaction: dict

quant_analysis: str

legal_findings: str

audit_verdict: str

graph = StateGraph(ComplianceState)

# Register specialist agents as graph nodes

graph.add_node("quant", data_quant_agent)

graph.add_node("legal", legal_scholar_agent)

graph.add_node("auditor", auditor_agent)

# Conditional edge: auditor rejects → retry

graph.add_conditional_edges(

"auditor",

lambda s: END if s["approved"] else "controller",

)

controller_agent()

Routes state in parallel; synthesizes results for Auditor.

data_quant_agent()

DeepSeek R1 analysis of transaction logs for anomalies.

legal_scholar_agent()

Qdrant hybrid search for SEC/FINRA & OFAC cross-refs.

auditor_agent()

Adversarial critique; sets approval only when logic is air-tight.

Max iterations = 3. Consensus failure triggers human review.

33 of 37

KNOWLEDGE BASE

Qdrant RAG Pipeline: Grounding the Legal Scholar Agent

How the Legal Scholar retrieves exact SEC clauses and policy docs at inference time.

Ingestion

Apache Hop + Airbyte pull SEC filings, FINRA rules, internal policy PDFs. Split into 512-token chunks with 50-token overlap.

Embedding + Indexing

Each chunk embedded with a local sentence-transformer model (BAAI/bge-m3). Stored in Qdrant with dense + sparse BM25 vectors.

Hybrid Search at Query

Legal Scholar agent sends query. Qdrant runs dense vector similarity AND BM25 keyword match. Results fused via RRF scoring.

● ● ●

from qdrant_client import QdrantClient

from qdrant_client.models import SearchRequest, SparseVector

client = QdrantClient(url="http://qdrant:6333")

# Hybrid BM25 + dense vector search

results = client.query_points(

collection_name="sec_filings",

query=dense_embedding, # 1024-dim bge-m3

sparse_vector=bm25_vec, # keyword match

limit=8, # top 8 chunks

)

34 of 37

OUTPUT VALIDATION — CODE

Guardrails AI + Pydantic: Making LLM Output Deterministic

The exact schema contract that converts probabilistic text into banking-core-ready JSON.

from pydantic import BaseModel, Field, validator

from enum import Enum

class TxnStatus(str, Enum):

APPROVED = "APPROVED"

BLOCKED = "BLOCKED"

REVIEW = "REVIEW"

class ComplianceReport(BaseModel):

transaction_id: str

status: TxnStatus

risk_score: float = Field(ge=0.0, le=1.0)

proof_identifier: str

@validator("risk_score")

def must_be_high_if_blocked(cls, v, values):

if values.get("status") == "BLOCKED" and v < 0.7:

raise ValueError("Risk score mismatch")

return v

Type Enforcement

Every field has a strict type. status must be one of {APPROVED, BLOCKED, REVIEW} — free text is rejected.

Cross-Field Validation

Custom validators check logical consistency: BLOCKED status requires risk_score ≥ 0.70. Catches LLM contradictions.

Guardrails AI Layer

Wraps the Pydantic model. On failure: logs the violation, triggers Auditor Agent retry, emits NACK to banking core.

Proof Identifier

Every approved output gets a deterministic PROOF-XXXXXXXX hash. Banking core can verify without re-running the LLM.

1 of 37

2 of 37

3 of 37

4 of 37

5 of 37

6 of 37

7 of 37

8 of 37

9 of 37

10 of 37

11 of 37

12 of 37

13 of 37

14 of 37

15 of 37

16 of 37

17 of 37

18 of 37

19 of 37

20 of 37

21 of 37

22 of 37

23 of 37

24 of 37

25 of 37

26 of 37

27 of 37

28 of 37

29 of 37

30 of 37

31 of 37

32 of 37

33 of 37

34 of 37

35 of 37

36 of 37

37 of 37