1 of 29

2 of 29

3 of 29

4 of 29

you

rn

5 of 29

6 of 29

7 of 29

8 of 29

9 of 29

“An LLM agent runs tools in a loop to achieve a goal.”

  • LLM Agent
  • Tools (e.g., via MCP)
  • In a loop
  • To achieve a goal

What Exactly is an Agent?

10 of 29

10×

Enterprise data growth, year over year

…not ONLY

Tokenmaxxing!

11 of 29

Human-cadence data systems 🧑‍💻

Designed for batch queries

QA cycles: ~16 days

Policy approvals: ~6 weeks

Agent-cadence query loads 🤖

Continuous, autonomous queries

Self-directed exploration

10× YoY data growth

12 of 29

Snowflake

Databricks

BigQuery

Hadoop

S3

ADLS

GCS

on-prem clusters

13 of 29

Vendor lock-in

Cost spirals

Expensive ETL

Fragmented governance

No schema context

14 of 29

Factory Floors

Customers' production sites (On-Prem)

Office

Their own teams (Cloud)

15 of 29

Portable compute

Runs anywhere data live

Federated catalog

Spans environments

Traveling governance

Policy moves with data

Intelligence layer

Reasons across the footprint

16 of 29

Factory Floors

Portable Spark · on-prem GKE

Office

Same Spark, in cloud

Federated catalog · Traveling governance · Intelligence

Control Plane

17 of 29

xLake Compute (Hybrid, On-premise, VPC)

Enterprise Context, Governance for Agents & Applications

Observability, Automated Operations & Semantic Knowledge Layer

xLake Platform

Data Runtime

Agentic Runtime

Agentic Data Management

Data Warehousing

Automated Data Engineering

AI Applications

Streaming Applications

Analytics & Business Applications

Data & AI Observability

18 of 29

TPC-DS benchmark · 12-node GKE cluster · near-linear to 6.3B rows

Peak: 3.26M rows/sec · 5× faster than baseline · 1TB profiled in under 3 minutes

19 of 29

45B

rows validated

in under 2 hours

top-3 US telco

Spark performance

on same hardware

$150–300K saved annually

faster validation

vs. baseline

TPC-DS benchmark

20 of 29

Snowflake

Centralized data warehouse

AI workloads

Growing rapidly

21 of 29

Databricks-style

xLake

Deployment

Foundation

Data location

Governance

Cost profile

AI integration

Managed SaaS

Proprietary runtime

Migrate to platform

Unity Catalog (native)

DBUs, compute + storage coupled

Mosaic / Genie

Customer-managed Kubernetes

Open source (Apache)

Stays where it lives

Apache Ranger (portable)

Compute and storage decoupled

Schema-aware AI Studio

22 of 29

AI at speed

Slow deployment

Trusted data, fast

Self-healing

Reactive firefighting

Agents repair pipelines

Cost discipline

Runaway cloud spend

Decoupled, right-sized

Always-on compliance

Audit debt

Policy travels with data

23 of 29

Factory Floors

Office

Portable Spark on-site and in cloud

Federated catalog spanning both surfaces

One governance regime, end to end

Not 'cloud or on-prem.' Both. On one architecture.

24 of 29

Data don't move anymore.

The compute does.

25 of 29

26 of 29

27 of 29

28 of 29

29 of 29

youtube.com/c/JonKrohnLearns

linkedin.com/in/jonkrohn

jonkrohn.com/talks