1 of 27

Copyright © 2026

Miguel Soto

Cloud Solutions Architect, LATAM

Intel Xeon 6

The future of AI Powered Cloud

1

2 of 27

Copyright © 2025

2

Copyright © 2026

2

3 of 27

3

The majority of enterprise �AI projects run on �Intel® Xeon®, built for scalable, �general-purpose AI workloads.

Copyright © 2026

3

4 of 27

4

GenAI is shifting from GPU-heavy LLM training �to smaller, more targeted models, inference �optimization, and agentic AI.

And, GenAI is only part of the enterprise AI story. �For decades, enterprises have relied on general-purpose compute to power AI – from data analytics and machine learning to forecasting and fraud detection.

Copyright © 2025

4

5 of 27

Customer service teams leverage churn prediction models to proactively identify and retain high-value clients.

Operations teams optimize supply chain decisions using predictive analytics to improve inventory accuracy and reduce delays.

The finance team uses continuous fraud detection to monitor all transactions in real-time and trigger instant alerts.

Product developers accelerate designs with rapid prototyping and GenAI-powered simulation tools.

5

Enterprise AI, in reality, looks like this:

Copyright © 2026

5

6 of 27

6

MATURE AI

EMERGING AI

Data & Feature Engineering

Classical Machine Learning Training & Inference

Deep Learning Inference

Generative AI Fine-Tuning / Training

Generative AI Inference

Agentic AI Orchestration

Edge & Embedded AI

Ingestion, transformation, orchestration, vectorization

Native execution; optimal for training and inference; ensemble models, batch processing

Small/mid-size DL models, transformer inference, real-time and batch execution

Multi-GPU orchestration, memory/I/O coordination, fallback compute

Executes small/mid-size model inference with low-latency response; orchestrates RAG pipelines and optimizes MoE routing

Task routing, tool execution, hybrid model coordination

Orchestrates low-latency GenAI inference at the edge, with fallback compute, RAG execution, and agentic coordination.

CPU-INVOLVED?

AI WORKLOAD CATEGORIES

CPU ROLE

PROOF POINTS

  • CPUs handle ~70% of �the AI pipeline workload — including ingestion, transformation, and vectorization
  • CPUs have powered Mature AI for 30+ years
  • SLMs and Agentic AI expected to drive 40%+ of enterprise AI deployments by 2027
  • Inference will dominate 80% of AI cycles by 2028

Copyright © 2026

6

7 of 27

7

The foundation is already there. Your business, your developers, �your operations have operated in this CPU environment for 30 years.

CPUs provide flexibility across your enterprise workloads and deployment environments.

CPUs are still the ideal compute for mature AI and AI data architectures.

CPUs are required in accelerator-based AI systems to serve as the central orchestrator.

CPUs are foundational to secure AI infrastructure, enabling trusted execution and enterprise-grade data protection.

Because…

Why CPUs now?

CPUs are optimized for emerging AI workloads like SLM inference and Agentic AI.

Copyright © 2026

7

8 of 27

8

With CPUs

the future is more accessible than you think.

Copyright © 2026

8

9 of 27

9

Copyright © 2026

9

10 of 27

10

Why you choose our accelerated CPUs:

Trusted Compute Foundation

AI-Optimized Architecture

Deployment & Workload Flexibility

Copyright © 2026

10

11 of 27

11

TRUSTED COMPUTE FOUNDATION

When we say Trusted Compute Foundation, we’re talking about a deep commitment to keeping your business safe, secure, and resilient - right from the heart of your infrastructure.

With Intel Xeon, security starts inside the chip itself. Technologies like SGX and TDX enable confidential computing and Zero Trust architectures at the hardware level - protecting data, applications, and memory even while in use.

Xeon is a proven foundation - trusted across enterprise workloads for decades. It integrates seamlessly into your existing environments, so you can modernize securely without retraining teams or rearchitecting systems.

100% of the Intel processor vulnerabilities addressed in 2024 were discovered through internal security research.

Intel scored 82.2, ranking #1 across the silicon industry for product security assurance maturity.

Intel reported 4.4x fewer firmware vulnerabilities in root-of-trust and 1.8x fewer in confidential computing technologies than AMD.

100%

#1

4.4x

Copyright © 2026

11

12 of 27

Xeon 6 handles up to 69,000 concurrent queries, up to 3.2× higher concurrent prompts than AMD EPYC 9965

69k

Xeon 6 with AMX delivers up to:

2.59× higher vector search throughput for RAG systems vs AMD EPYC 9575F

1.93× higher DLRM performance vs AMD EPYC 9654

1.85× faster BERT-Large inference vs AMD EPYC 9654

17× faster ResNet-50 batch inference vs AMD EPYC 9654

12

AI OPTIMIZED INFRASTRUCTURE

When we say AI Optimized infrastructure, we mean giving your business the tools it needs to run enterprise AI smarter, faster, and easier – on the Xeon platform it already trusts.

Xeon accelerates AI workloads with built-in instruction sets that deliver intelligent performance out of the box, without changing your apps or retraining your teams.

Integrated accelerators handle data movement, analytics, and security - streamlining the AI pipeline and freeing up compute for what matters most.

And with high core density, advanced memory, and energy-efficient design, Xeon scales to support any AI workload - from single models to enterprise-wide deployments.

Copyright © 2026

12

13 of 27

Xeon 6 delivers up to 50% lower TCO and stronger performance than AMD EPYC 9005 - across diverse enterprise workloads.

Supports predictable scaling and operational autonomy across deployment models.

Intel–NVIDIA collaboration aligns �x86 stack with CUDA architecture — simplifying deployment across GPU-accelerated AI workloads.

50%

x86

13

DEPLOYMENT & WORKLOAD FLEXIBILITY 

When we say Deployment & Workload Flexibility, we mean �it’s built to handle any business computing challenge - wherever and however you need it.

Xeon supports diverse workloads on a single platform, so you don’t need separate systems for different jobs.

Deployments are simplified and future-ready thanks to Xeon’s ability to run in public, private, sovereign, and edge environments - from data centers to retail stores to remote sites.

Its open ecosystem works with leading software stacks and avoids vendor lock-in, making it easy to launch new solutions �and reduce investment risk as your business evolves.

Copyright © 2026

13

14 of 27

Matching GPU Price Performance Using Amazon Instances With Intel® Xeon® Processors

  • Storm Reply helps its customers deploy large language models (LLMs) and Generative AI solutions.
  • Needed a cost-efficient, high-availability hosting environment to build its LLM-based solution to serve a major company in the energy sector.
  • A solution developed for the Amazon C7i-family (shared with M7i and R7i) supported by 4th Gen Intel® Xeon® Scalable processors, Intel libraries, and Intel’s open GenAI framework proved an ideal hosting environment for Storm Reply’s LLM workloads.
  • LLM inference on instances with Intel Xeon Scalable processors was on par with GPU instance price performance. Intel libraries also provided an average response time of 92 seconds, contrasting the 485 seconds required without the Intel library.1

1 For more complete information about performance and benchmark results, visit https://www.intel.com/content/www/us/en/customer-spotlight/stories/storm-reply-customer-story.html

Industry

IT Services &

IT Consulting

Organization Size

201-500

Country

Italy

Partners

AWS

Learn more

Case Study

14

15 of 27

Solving problems �from AI/ML and Analytics �to Database and HPC

15

16 of 27

Achieve Cost-Performance for the Workloads that Matter�with new AWS 8th Gen EC2 Instances powered by custom Intel® Xeon® 6 CPUs

16

17 of 27

Delivering through �our strong partnership

17

18 of 27

Intel and AWS Partnership Dates Back To the First EC2 Instance with Intel® Xeon® Processor

“At AWS, we’re committed to delivering the most powerful and innovative cloud infrastructure to our customers. By co-developing next-generation AI fabric chips on Intel 18A, we continue our long-standing collaboration, dating back to 2006 when we launched the first Amazon EC2 instance featuring their chips. Our continued collaboration allows us to empower our joint customers with the ability to run any workload and unlock new AI capabilities.”� –Matt Garman, CEO at AWS

18

19 of 27

Hardware optimization

© Copyright 2026, Intel | Confidential – NDA Required

AWS instances:

M6i, C6i, R6i…

AWS instances:

M5, C5, R5…

AWS instances:

M1

Intel Xeon

(2006)

Intel Xeon Scalable 4th Gen

(2021)

Intel Xeon Scalable 2nd Gen

(2019)

Intel Xeon Scalable 3rd Gen

(2021)

AWS instances:

M7i, C7i, R7i…

Intel Xeon Scalable 5th Gen

(2024)

Intel Xeon Scalable 6th Gen

(2025)

AWS instances:

I7ie, G7, P6

AWS instances:

M8i,M8id,…

Price:

M1.2xlarge:

$255

Price:

M5.2xlarge

$280

Price:

M6i.2xlarge

$280

Price:

M7i.2xlarge

$294

Flex: $279

Price:

N/A

Price:

M8i.2xlarge

$309

Flex: $293

*Prices based on AWS public calculator.

19

20 of 27

Intel® Architecture Instance Types on AWS

General purpose instances provide a balance of compute, memory and networking resources, and can be used for a variety of diverse workloads.

Habana Gaudi

1st and 2nd Gen Intel® Xeon® �Scalable processors

General

Purpose

Compute-�Optimized

Memory

Optimized

Accelerated�Compute

Compute Optimized instances are ideal for compute bound applications that benefit from high performance processors.

Memory optimized instances are designed to deliver fast performance for workloads that process large data sets in memory.

Accelerated computing instances use hardware accelerators, or co-processors, to perform functions more efficiently.

2nd Gen Intel® Xeon® �Scalable processors

Intel® Xeon® Scalable

processors

Intel® Xeon® v4

Processors

Storage optimized instances are designed for workloads that require high, sequential read and write access to very large data sets on local storage.

Storage�Optimized

Intel® Xeon® v3

Processors

DL1

M5(d)n

M5zn

R5(d)n

R5b

T2

C5(d)

R5(d)

HMI

T3

M5(d)

C5n

z1d

P3dn

I3en

M4

X1

R4

P2

P3

G3

F1

H1

I3

C4

X1e

D2

X2idn

X2iezn

I4i

HPC6id

3rd Gen Intel® Xeon® �Scalable processors

HPC

Optimized

Ideal for applications that benefit from high-performance processors including large, complex simulations and deep learning workloads.

4th Gen Intel® Xeon® �Scalable processors

R7iz

M7i

M6i(d)

M6i(d)n

C6i(d)

C6in

R6i(d)

R6i(d)n

X2iedn

D3

D3en

G4dn

P4d

C7i

R7i

Features Intel® AMX

U7i

C7i-Flex

M7i-Flex

5th Gen Intel® Xeon® �Scalable processors

6th Gen Intel® Xeon® �Scalable processors

M8i

M8i-Flex

R8i

R8i-Flex

I7i

I7ie

See https://aws.amazon.com/ec2/instance-types/ and speaker notes for details.

C8i

C8i-Flex

20

21 of 27

Instance Analysis

Current

$

6i

$/ Perf *

7i

$/ Perf *

c5.xlarge

0.2620

c6i.xlarge

20%

c7i.xlarge

33%

c6a.2xlarge

0.4716

c6i.2xlarge

3%

c7i.2xlarge

29%

m5.xlarge

0.3060

m6i.xlarge

20%

m7i.xlarge

33%

m5.2xlarge

0.6120

m6i.2xlarge

20%

m7i.2xlarge

33%

m6a.2xlarge

0.5508

m6i.2xlarge

3%

m7i.2xlarge

29%

Public prices, São Paulo Region

* Estimated

21

22 of 27

Resource optimization

22

$ 0.9157

4th Generation Intel Xeon Scalable

5:3�Instance Consolidation

2nd Generation Intel Xeon Scalable

C5.xlarge

5 instances

3 instances

m7i-flex.xlarge

$ 0.2620

*Hourly price per instance in São Paulo

*Pay as you go

*Linux

Total for 5 instances

$ 1.31

$ 0.3052

Total for 3 instances

30% potential savings

48 GB RAM

40 GB RAM

22

23 of 27

ITAU Unibanco – Caso de Estudio

El banco más grande de Brasil y América Latina, con operaciones en todo el mundo que atiende a unos 55 millones de clientes.

23

24 of 27

Reto: Transformar y modernizar sus aplicaciones, reduciendo costos, aumentando las ganancias y mejorando la escalabilidad.

ITAU Unibanco – Caso de Estudio

24

25 of 27

Solución: Migró el 99% de su nube privada y el 20% de su plataforma distribuida a AWS. +19.000 servidores

ITAU Unibanco – Caso de Estudio

25

26 of 27

Resultados: Reducción del 99% en el tiempo de entrega de la plataforma. Satisfacción del cliente

De

3.5x – 6.4 veces

Rendimiento de la infraestructura mediante instancias de AWS basadas en

Intel Xeon de 4.ª generación.

ITAU Unibanco – Caso de Estudio

26

27 of 27

27