Sovereign Intelligence: Deploying Air-Gapped Agentic Workflows for Financial Compliance
Chetan Hirapara
Presented By
Lead Data Scientist @Teradata
Chetan Hirapara
Lead Data Scientist @Teradata
About Me
SESSION ROADMAP
What We're Building Today
01
The Compliance Crisis
$61B problem — why AI is the only answer
04
Live Demo: $50K Wire Fraud
Multi-agent debate in action
02
The Financial AI Trilemma
Privacy vs. Reasoning vs. Auditability
05
Open Source vs Proprietary
2026 CTO decision framework
03
The Sovereign Stack
Open-source architecture deep dive
06
Build vs Buy Calculus
ROI, hardware and team requirements
"In Finance, the model is the engine.
Privacy is the chassis."
Without open source, you're renting intelligence from a competitor.
01
SOVEREIGNTY
Your data. Your weights. Your logic. Owning your AI stack is your competitive moat.
02
TRANSPARENCY
Open weights means every decision is reproducible. Full audit trails by design.
03
ADVERSARIALISM
Agents that challenge each other are more reliable than any single model.
04
DETERMINISM
Banking cores need schemas. Guardrails AI turns text into certified structured data.
PRINCIPLES
The Business Case & Strategic Why
THE PROBLEM — 2026 DATA
The $61 Billion Compliance Trap
Financial institutions spend a fortune on manual KYC/AML — and are still getting fined billions.
$61B
Annual AML/KYC cost
(US + Canada)
$72.9M
Average per-firm spend
on KYC/AML ops
$3.8B
Global regulatory fines
in 2025
417%
YoY AML penalty surge
H1 2025 vs H1 2024
99% of US/Canada firms reported rising compliance costs
TD Bank fined $1.3B — largest FinCEN penalty in history
Only 4% of SARs ever receive law enforcement follow-up
AI-driven KYC cuts costs 40–70% and halves false positives
THE ROOT CAUSE
Why Single-Prompt LLMs Fail at Finance
Replacing monolithic prompts with a team of specialized adversarial agents is the key breakthrough.
Single Model — The Old Way
Hallucinates on multi-step financial reasoning
Cannot reliably critique its own logic
No separation of legal vs. quantitative tasks
Probabilistic output — banks need deterministic schemas
Single context — no adversarial validation layer
VS
Agentic Team — The Sovereign Way
Distributes cognitive load across specialist agents
Auditor Agent challenges every finding adversarially
Legal Scholar + Data Quant work in parallel
Multi-agent debate achieves mathematical & legal consensus
Guardrails AI enforces deterministic schema on output
Result: Verified, consensus-driven output — with a full audit trail of every agent's reasoning step.
CORE CHALLENGE
The Financial AI Trilemma
You can't pick two. The Sovereign Stack delivers all three simultaneously.
Agentic
Reasoning
Multi-step logic. Dynamic financial analysis.
Impossible with single-prompt LLMs.
Absolute
Privacy
Zero cloud exposure. Air-gapped on-premise.
Transaction data never leaves your data center.
Audit-Grade
Output
Deterministic schemas. Immutable audit logs.
Banking-core-compatible structured JSON.
?
The unsolved
intersection
Until now.
The Mandate: Bring autonomous, reasoning-capable AI entirely within the private data center — without sacrificing speed or deterministic reliability.
The
Sovereign
Quant
Building Private Agentic Workflows with Open Source LLMs
Data Sovereignty
Agentic Reasoning
Audit-Grade Output
CONFIDENTIAL · FOR CONFERENCE DISTRIBUTION ONLY
The Architecture
CONFIDENTIAL · FOR CONFERENCE DISTRIBUTION ONLY
AGENT ARCHITECTURE
Meet Your Compliance Team
Four specialist AI agents that debate, challenge, and validate until consensus is reached.
Controller Agent
THE SUPERVISOR
Central router. Ingests the initial prompt, delegates sub-tasks, and synthesises the final consensus output.
LangGraph / AutoGen
Legal Scholar Agent
THE REGULATOR
Grounded in Qdrant vector DB. Cross-references SEC filings, FINRA Rule 3310, KYC policy docs, live OFAC sanctions.
BM25 Hybrid Search + RAG
Data Quant Agent
THE MATHEMATICIAN
Parses transaction logs. Recalculates margins. Identifies statistical anomalies across 128K+ token contexts.
Llama 4 Scout (10M ctx)
Auditor Agent
THE ADVERSARY
The internal skeptic. Aggressively challenges all agent findings. Triggers retry loops when logical flaws found.
Multi-Agent Debate Protocol
MULTI-AGENT DEBATE & REASONING LOOP
ZERO-TRUST PRIVACY
The Filtration Funnel: PII Never Reaches the LLM
By architecture, not policy — raw transaction data is anonymized before any agent ever sees it.
UNTRUSTED
Raw Data In
Transaction logs with full PII: names, account numbers, SSNs, beneficiary data, addresses.
>
FILTERING
Local Presidio
(NER/Regex)
Runs 100% offline. Auto-identifies and redacts PII. No network calls. Zero external exposure.
>
TRUSTED
Anonymized Tokens
to Agents
Agents reason over anonymized structures only. Zero memory-leakage risk across all debate loops.
Raw transaction data never reaches any language model — by architecture, not policy
Local Presidio operates entirely offline as NER/Regex mesh — zero network calls
Multi-agent loop reasons only over anonymized tokens — no memory leakage possible
DeepSeek & Llama run air-gapped — models never phone home to any external API
LIVE DEMO SCENARIO
Demo: The Suspicious $50K Wire Transfer
Walking through a real compliance check — step by step, agent by agent.
TRIGGER: $50,000 wire transfer flagged. Sender: unknown shell company. Destination: offshore jurisdiction.
1
Controller Agent →
Request Ingested
Controller decomposes the task and routes parallel queries to all sub-agents simultaneously.
2
Data Quant Agent →
Mathematical Verification
Parses logs. Finds pattern: 3 transfers in 48h, each just below $10K structuring threshold.
3
Legal Scholar →
Entity & Sanctions Check
Searches Qdrant + OFAC + adverse media. Result: entity flagged on internal watchlist.
4
Controller Agent →
Draft Report Submitted
Synthesizes findings. Drafts "Proceed with Enhanced Due Diligence." Submits to Auditor.
Auditor Agent →
REJECTION — Retry Forced
Auditor: Legal Scholar watchlist hit was underweighted. Forces debate loop iteration #2.
6
All Agents →
Consensus Reached
Final: TRADE BLOCKED – Compliance Violation. Schema validated. Audit JSON delivered to core.
5
CONFIDENTIAL · FOR CONFERENCE DISTRIBUTION ONLY
SYSTEM OUTPUT
The Final Payload: Dual-Output Architecture
Every compliance run produces two artefacts — one for machines, one for auditors.
Structured JSON for Banking Core
{
"transaction_id": "TXN-987654321",
"status": "BLOCKED",
"risk_score": 0.94,
"compliance_checks": {
"aml_screening": "FAIL",
"kyc_verification": "PASS",
"sanctions_list": "MATCH",
"agent_consensus": true,
"debate_iterations": 2
},
"audit_trail": {
"decision_ts": "2026-03-31T14:22:01Z",
"proof_id": "PROOF-A1B2C3D4E5"
}
}
✅ APPROVED — AUDIT READY
Audit-Grade Risk Report
Transaction Assessment
TXN-987654321 HIGH RISK. Multi-agent adversarial debate: 2 iterations. Final consensus: BLOCKED.
Compliance Checks
AML screening FAIL. KYC verification PASS. OFAC sanctions MATCH found. FINRA Rule 3310 reviewed.
Mathematical Proofs
Theorem 1.1: ∀t ∈ T, V(t) → {Approved, Rejected, Flagged}. Agent consensus verified via GRPO.
Debate Log
Loop #1 rejected by Auditor: Legal Scholar watchlist hit underweighted. Loop #2 consensus reached.
REAL-WORLD EVIDENCE
Case Studies: Agentic AI Cutting Compliance Costs
Early adopters of agent-based AML/KYC architectures are already seeing measurable results.
Tier-1 US Investment Bank
Problem: 16 million AML alerts per year. Only 4% warranted follow-up. Analysts spending 60% of time on false positives.
Solution: Multi-agent triage: Quant agent pre-scores transactions. Legal Scholar cross-checks against watchlists. Auditor approves or escalates.
Result: False positive rate reduced by 52%. Analyst capacity freed by 40%. Annual compliance opex down $18M.
LangGraph · DeepSeek R1 70B · Qdrant · Guardrails AI
European RegTech Consortium
Problem: 3 banks sharing AML intelligence. Cannot share raw customer data due to GDPR. Needed collaborative detection without data pooling.
Solution: Federated learning via Flower.ai. Each bank trains locally. Gradient averaging produces a shared detection model.
Result: Cross-institution fraud detection improved 38%. Zero customer data shared. GDPR compliance maintained throughout.
Flower.ai · Federated Learning · Differential Privacy ε=8
Singapore Digital Bank
Problem: 50,000+ scam cases in 2025. Rapid fund movements outpaced traditional rules-based detection tools.
Solution: Real-time agentic pipeline: transaction arrives → PII stripped via Presidio → Quant + Legal agents analyse in parallel → output in <2s.
Result: Detection latency reduced from 4 hours to 1.8 seconds. Scam interception rate improved by 61%.
Local Presidio · vLLM · TensorRT-LLM · Redis
Sources: Fenergo 2025, Napier AI 2025, Silenteight Q4 2025, LexisNexis Global Financial Crime Compliance 2024
Open Source vs. Proprietary: The CTO View
The capability gap has closed. The deployment trade-offs have not.
Capability
Data Privacy
✅ Full On-Prem / VPC
Fine-Tuning
✅ Deep — LoRA, full param
Cost at Scale
✅ Low OpEx after GPU invest
Model Transparency
✅ Full weights & logic visible
Capability
Reasoning (MATH-500)
✅ 97.3% DeepSeek R1 (MIT)
Context Window
✅ 10M tokens (Llama 4 Scout)
Open Source
(DeepSeek-V3 / Llama 4)
Vendor Lock-in
✅ None — weights are yours
"In Finance, the model is the engine — but Privacy is the chassis. Without open source, you're renting intelligence from a competitor."
Data Privacy
✅ Full On-Prem / VPC
⚠️ Trust agreements only
Fine-Tuning
✅ Deep — LoRA, full param
❌ API-based only, limited
Cost at Scale
✅ Low OpEx after GPU invest
❌ High & unpredictable
Proprietary
(GPT-5 / Claude Opus 4.6)
Model Transparency
✅ Full weights & logic visible
❌ Black box — no auditability
Reasoning (MATH-500)
✅ 97.3% DeepSeek R1 (MIT)
✅ ~98% GPT-5 / o3
Context Window
✅ 10M tokens (Llama 4 Scout)
✅ 1M tokens (GPT-5)
Vendor Lock-in
✅ None — weights are yours
❌ Full API dependency
Regulatory Audit Trail
✅ Fully reproducible runs
⚠️ Limited — provider logs only
Inference Cost /1M tk
✅ $0.14 (DeepSeek API)
❌ $2.50–$15.00 (GPT-5/Claude)
"In Finance, the model is the engine — but Privacy is the chassis. Without open source, you're renting intelligence from a competitor."
IMPLEMENTATION GUIDE
Your 90-Day Sovereign Stack Roadmap
Three phases. One air-gapped, production-ready compliance engine.
PHASE 1
Days 1–30
Foundation
MILESTONE
LLM answering domain queries fully on-premises
PHASE 2
Days 31–60
Orchestration
MILESTONE
First end-to-end compliance check runs locally
PHASE 3
Days 61–90
Harden & Audit
MILESTONE
Production-ready, audit-grade output certified
Stack: vLLM · LangGraph · Qdrant · Local Presidio · Guardrails AI · Kong · DeepSeek R1 / Llama 4 Scout · Redis · Apache Hop
BUSINESS CASE
Build vs. Buy: The ROI Math
AI-driven compliance can cut costs 40–70%. The numbers justify the GPU investment.
40–70%
Cost reduction
via AI-driven KYC
50%
Reduction in
false positive alerts
$23.4B
US savings potential
from AI compliance
8× H100
GPU needed for
DeepSeek R1 full
OPEN SOURCE MODEL LANDSCAPE
2026 State of Open Models for Finance
The capability gap with proprietary models has largely closed. The cost gap has not.
DeepSeek R1
MIT ✅
671B MoE · 37B active
▸ 97.3% MATH-500 — highest open model score
▸ Chain-of-thought reasoning transparency
Llama 4 Scout
Llama 4 Community
109B total · 17B active
▸ 10M token context — ingest entire 10-K in one shot
▸ Single H100 with quantization
DeepSeek V3.2
MIT ✅
671B MoE · MIT license
▸ GPT-5-level reasoning on coding + math
▸ Integrated thinking in tool-use workflows
💡 10× cost drop/year: GPT-4 equiv. cost $20/1M tokens in 2022 vs $0.14 today with open models.
INFRASTRUCTURE
Deploying methods: vLLM vs TGI vs Ollama
Feature | 🚀 vLLM | 🛠️ TGI | 📦 Ollama |
Core Philosophy | High-Throughput Engine | K8s Model Serving | Localhost Simplicity |
Best For | Multi-agent parallel chatter | Diverse architectures & HF-native tooling | Local dev, prototyping, & setup speed |
Optimization | PagedAttention & Dynamic Batching | Quantized weights & streaming output | Auto-downloading & simple CLI |
Hardware | Multi-GPU / Enterprise (A100/H100) | Native Docker & K8s scaling | Mac, Windows, Linux (Consumer & Pro) |
API | OpenAI-compatible | Deep HF ecosystem integration | Simple local REST endpoints |
Strategic Guidance:
COMPUTE REALITY
The Inference Imperative: Agentic Is Compute-Hungry
A single compliance query triggers dozens of parallel LLM calls. Hardware must keep up.
Runtime Comparison Matrix
Metric
vLLM
TensorRT-LLM
Best For
Multi-agent parallel chatter
Time-sensitive quant queries
Optimization
Dynamic continuous batching
Max latency, deep HW integration
GPU Support
Multi-GPU, any CUDA
NVIDIA-specific, precision kernels
Throughput
High — variable request sizes
Maximum — fixed precision kernels
Memory
Redis KV shared state
Redis KV shared state
Shared Memory Layer: Redis (Short-Term State)
Both runtimes connect to Redis for sub-millisecond state retrieval — agents remember debate context without constant prompt reloading.
Hardware Tiers
Full (DeepSeek R1 671B)
8× A100 80GB or H100
~$20K/mo
Mid (Llama 4 Maverick)
4× H100 or A100 40GB
~$8K/mo
Entry (Llama 4 Scout)
1× H100 + quantization
~$3.5K/mo
Edge (R1 70B distil)
2× A100 or 4× RTX 4090
~$1.5K/mo
Challenge: Sub-second latency for multi-agent chatter — entirely within local hardware constraints.
INFRASTRUCTURE
Deploying DeepSeek R1 with vLLM: From Zero to Serving
Production-ready LLM serving on-premises in under 30 minutes.
● ● ●
# 1. Pull the vLLM serving image
docker pull vllm/vllm-openai:latest
# 2. Serve DeepSeek R1 (70B distil, 2× A100)
docker run --gpus all \
-p 8000:8000 \
-v /models:/models \
vllm/vllm-openai:latest \
--model deepseek-ai/DeepSeek-R1-Distill-Llama-70B \
--quantization awq \
--max-model-len 131072 \
--tensor-parallel-size 2
● ● ●
# 3. Test endpoint (OpenAI-compatible)
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model":"DeepSeek-R1-Distill-Llama-70B",
"messages":[{"role":"user","content":"Analyse TXN-001"}],
"max_tokens":2048}'
AWQ Quantization
4-bit quantization via AutoAWQ reduces VRAM from 140GB → 35GB with <2% accuracy loss on financial tasks.
Continuous Batching
vLLM's PagedAttention dynamically batches concurrent agent requests. Critical for multi-agent parallel chatter.
Redis KV Cache
Share the KV cache across agents via Redis. Agents referencing the same document don't re-tokenize it.
Air-Gap Networking
Run vLLM with --no-internet flag. Disable CUDA telemetry. Bind to internal interface only (--host 10.x.x.x).
Safety, Evidence & Q&A
PII REDACTION — CODE
Local Presidio: Zero-Leak PII Redaction in Code
The exact code pattern used to anonymize transaction data before any LLM ever sees it.
from presidio_analyzer import AnalyzerEngine
from presidio_anonymizer import AnonymizerEngine
analyzer = AnalyzerEngine() # runs 100% offline
anonymizer = AnonymizerEngine()
def redact_transaction(txn: dict) -> dict:
text = json.dumps(txn)
# Detect PII entities in the raw text
results = analyzer.analyze(
text=text, language="en",
entities=["PERSON", "IBAN", "US_SSN",
"US_BANK_NUMBER", "EMAIL_ADDRESS",
"PHONE_NUMBER", "LOCATION"],
)
# Replace with anonymized tokens
anonymized = anonymizer.anonymize(
text=text, analyzer_results=results
)
return json.loads(anonymized.text)
Before Presidio (RAW)
"sender": "John Doe",
"acct": "123-456-789",
"ssn": "001-01-0001",
"dest": "Cayman Islands"
After Presidio (SAFE)
"sender": "<PERSON_1>",
"acct": "<IBAN_1>",
"ssn": "<US_SSN_1>",
"dest": "<LOCATION_1>"
Anonymisation is reversible by the originating system only — the LLM agents never see the mapping table.
OUTPUT INTEGRITY
Deterministic Validation: From Probabilistic to Precise
Banking cores require deterministic schemas. Guardrails AI + Pydantic bridge the gap.
1
LLM Debate Output
Language models produce probabilistic natural-language text. Raw output is expressive but schema-free and non-deterministic.
>
2
Guardrails AI + Pydantic
Acts as an unyielding validation layer. Outputs are strictly type-checked and enforced into predefined schema contracts.
>
3
Deterministic Structured Schema
Every field validated, typed, and signed. Validation failure triggers the Auditor Agent for a self-correction loop.
Sample Validated JSON Output
{ "transaction_id": "TXN-987654321",
"status": "BLOCKED", "risk_score": 0.94,
"compliance_checks": {
"aml_screening": "FAIL", "kyc_verification": "PASS",
"sanctions_list": "MATCH", "agent_consensus": true },
"proof_identifier": "PROOF-A1B2C3D4E5" }
Validation Failure → Auto-Retry
Guardrails rejects payload and triggers Auditor Agent for a self-correction debate loop.
Validation Pass → Banking Core
Dual-payload: structured JSON for legacy mainframe + immutable human-readable risk report.
Q&A PREPARATION
Hard Questions — and the Honest Answers
Prepare for these. A technically sophisticated audience will ask all of them.
DeepSeek is Chinese. Can we trust it in finance?
The model weights are open — inspect every parameter. Run air-gapped; it cannot exfiltrate data. Audit the code on GitHub. The same scrutiny applies to any model.
How do you handle model drift over time?
Periodic fine-tuning on new regulatory text via LoRA. Automated benchmark regression tests on a held-out compliance dataset before any model update goes live.
What if the Auditor Agent itself hallucinates?
Max iteration cap (default 3). If consensus is not reached, the transaction is auto-BLOCKED and escalated to a human reviewer. The system fails safe, not open.
How do you prove this to a regulator?
Every decision has an immutable PROOF-ID + full agent debate transcript. The Pydantic schema is deterministic and reproducible. You can re-run any decision from logs.
Is this cheaper than buying a SaaS compliance tool?
Above ~10M tokens/month: yes. GPU CapEx is high upfront, but OpEx drops dramatically. Fenergo/Napier contracts typically run $2–5M/year at enterprise scale.
What about latency? Banks need sub-100ms.
Agent debate is not on the hot path for real-time payments. It runs async for high-value / flagged transactions. Real-time pre-screening uses a smaller, faster 7B model.
RISK MANAGEMENT
Key Risks — and How the Stack Mitigates Them
Every architecture choice addresses a specific failure mode in production finance AI.
RISK: LLM Hallucination
MIT: Multi-Agent Debate Protocol
Auditor Agent adversarially challenges every output. No consensus = no output.
RISK: PII Data Leakage
MIT: Local Presidio NER/Regex Layer
Raw transaction data anonymized BEFORE any agent sees it. 100% offline.
RISK: Non-Deterministic Output
MIT: Guardrails AI + Pydantic
Every output type-checked against predefined contracts. Failures trigger self-correction.
RISK: External API Dependency
MIT: Air-Gapped VPC Architecture
All models run locally. No external calls. DeepSeek and Llama never phone home.
RISK: GPU Latency at Scale
MIT: vLLM + TensorRT-LLM Runtime
Dynamic continuous batching handles erratic multi-agent chatter. Sub-second latency.
RISK: Regulatory Audit Failure
MIT: Immutable Proof Identifiers
Every decision carries a cryptographic proof ID + full agent debate log.
AUDIENCE FIT
Who Should Build the Sovereign Stack?
This is for organisations where data privacy is non-negotiable and intelligence is strategic.
Build Now — You Qualify If:
You handle regulated financial data (KYC/AML/SEC)
Transaction logs CANNOT touch public APIs like OpenAI
You need a full audit trail for every AI decision
Compliance costs exceed $10M/year and still rising
You have or can provision 8+ enterprise GPUs
Your ML team has LLM deployment experience
Start Smaller If:
GPU budget <$200K — start with quantised 70B via Ollama
Team <3 ML engineers — use managed vLLM on private cloud
Use case is non-regulated — proprietary APIs may suffice
No compliance mandate yet — build a proof-of-concept first
Data privacy needs are met by contractual agreements alone
Path to the full sovereign stack remains open as you scale
ADVANCED PRIVACY LAYER
Beyond Air-Gapping: Federated Learning & Differential Privacy
For multi-institution consortia: collaborative intelligence without sharing raw data.
Federated Learning (Flower.ai)
Each bank trains on its own data. Only model gradients — not raw transactions — are shared.
1
Local Model Training
Each institution fine-tunes the base LLM on its own transaction data — never shared.
2
Gradient Aggregation
Only model weight deltas are sent to a secure aggregation server (Flower.ai).
3
Federated Averaging
Global model is updated by averaging gradients. No raw data ever leaves any institution.
4
Improved Global Model
All participants benefit from collective intelligence without sacrificing data sovereignty.
Differential Privacy (DP Noise)
Mathematical guarantee: even if an attacker sees the model, they cannot infer any single customer's data.
from opacus import PrivacyEngine
# Attach DP to the LLM fine-tuning loop
privacy_engine = PrivacyEngine()
model, optimizer, data_loader = \
privacy_engine.make_private(
module=llm_model,
optimizer=optimizer,
data_loader=train_loader,
noise_multiplier=1.1, # ε ≈ 8
max_grad_norm=1.0,
)
ε (epsilon) = 8 → industry standard for financial models
Smaller ε = stronger privacy guarantee but lower accuracy
TECHNICAL DEEP-DIVE
Building the Debate Loop: LangGraph Walkthrough
How to wire four specialist agents into a cyclic adversarial reasoning graph in ~50 lines.
● ● ●
from langgraph.graph import StateGraph, END
from typing import TypedDict, Annotated
class ComplianceState(TypedDict):
transaction: dict
quant_analysis: str
legal_findings: str
audit_verdict: str
graph = StateGraph(ComplianceState)
# Register specialist agents as graph nodes
graph.add_node("quant", data_quant_agent)
graph.add_node("legal", legal_scholar_agent)
graph.add_node("auditor", auditor_agent)
# Conditional edge: auditor rejects → retry
graph.add_conditional_edges(
"auditor",
lambda s: END if s["approved"] else "controller",
)
controller_agent()
Routes state in parallel; synthesizes results for Auditor.
data_quant_agent()
DeepSeek R1 analysis of transaction logs for anomalies.
legal_scholar_agent()
Qdrant hybrid search for SEC/FINRA & OFAC cross-refs.
auditor_agent()
Adversarial critique; sets approval only when logic is air-tight.
Max iterations = 3. Consensus failure triggers human review.
KNOWLEDGE BASE
Qdrant RAG Pipeline: Grounding the Legal Scholar Agent
How the Legal Scholar retrieves exact SEC clauses and policy docs at inference time.
Ingestion
Apache Hop + Airbyte pull SEC filings, FINRA rules, internal policy PDFs. Split into 512-token chunks with 50-token overlap.
>
Embedding + Indexing
Each chunk embedded with a local sentence-transformer model (BAAI/bge-m3). Stored in Qdrant with dense + sparse BM25 vectors.
>
Hybrid Search at Query
Legal Scholar agent sends query. Qdrant runs dense vector similarity AND BM25 keyword match. Results fused via RRF scoring.
● ● ●
from qdrant_client import QdrantClient
from qdrant_client.models import SearchRequest, SparseVector
client = QdrantClient(url="http://qdrant:6333")
# Hybrid BM25 + dense vector search
results = client.query_points(
collection_name="sec_filings",
query=dense_embedding, # 1024-dim bge-m3
sparse_vector=bm25_vec, # keyword match
limit=8, # top 8 chunks
)
OUTPUT VALIDATION — CODE
Guardrails AI + Pydantic: Making LLM Output Deterministic
The exact schema contract that converts probabilistic text into banking-core-ready JSON.
from pydantic import BaseModel, Field, validator
from enum import Enum
class TxnStatus(str, Enum):
APPROVED = "APPROVED"
BLOCKED = "BLOCKED"
REVIEW = "REVIEW"
class ComplianceReport(BaseModel):
transaction_id: str
status: TxnStatus
risk_score: float = Field(ge=0.0, le=1.0)
proof_identifier: str
@validator("risk_score")
def must_be_high_if_blocked(cls, v, values):
if values.get("status") == "BLOCKED" and v < 0.7:
raise ValueError("Risk score mismatch")
return v
Type Enforcement
Every field has a strict type. status must be one of {APPROVED, BLOCKED, REVIEW} — free text is rejected.
Cross-Field Validation
Custom validators check logical consistency: BLOCKED status requires risk_score ≥ 0.70. Catches LLM contradictions.
Guardrails AI Layer
Wraps the Pydantic model. On failure: logs the violation, triggers Auditor Agent retry, emits NACK to banking core.
Proof Identifier
Every approved output gets a deterministic PROOF-XXXXXXXX hash. Banking core can verify without re-running the LLM.
Session Complete
The Sovereign Quant
Building Private Agentic Workflows with Open Source LLMs
Scan for slides + GitHub repo link after session
Questions: https://www.linkedin.com/in/chetan-hirapara
Next talk: "Fine-Tuning Llama 4 on SEC Filings with LoRA"
Thank you for your attention
CONNECT WITH ME
Scan for Code
THANK YOU