Transforming Generative AI �from Unsustainable to Attainable��Sree Ganesan�VP Product, d-matrix.ai
2024 © d-Matrix
1
The Exploding World of Generative AI
2024 © d-Matrix
2
Microsoft
Inf/day
1
Trillion
Meta
Inf/day
200
Trillion
Search Engines
Image Generation
Content Creation
Conversational Agents
Question Answering Systems
Code Generation
Video Generation
3D-Models/ Scenes
Digital Twins
Smart Factories
Model Repo
But… Demand WAY Outstrips Supply
2024 © d-Matrix
3
d-Matrix breaking through the barrier with In-Memory Compute
Why?
Cost
2024 © d-Matrix
5
$100,000,000,000 in CAPEX alone to deploy ChatGPT or Bard into every Google Search
28,936 Nvidia GPUs + $700,000/day for OpenAI to run ChatGPT-3
When asked if training GPT-4 cost $100,000,000, Altman replied, “It’s more than that.”
Power
2024 © d-Matrix
6
Without sustainable practices, AI will consume more energy than the human workforce by 2025.
The energy needed to power AI could account for up to 3.5% of global electricity consumption by 2030 if current practices remain unchanged.
The Generative AI Race Has a Dirty Secret
Integrating LLMs into search engines could mean a fivefold increase in computing power
and huge carbon emissions.
Size
2024 © d-Matrix
7
“I think we're at the end of the era where it's going to be these, like, giant, giant models. We'll make them better in other ways.”
- Sam Altman, OpenAI CEO
April 2023
Source: CSET, Georgetown University, 2022
Note: The blue line represents growing costs assuming compute per dollar doubles every four years, with error shading representing no change in compute costs or a doubling time as fast as every two years. The red line represents expected GDP at a growth of 3% per year from 2019 levels with error shading representing growth between 2% and 5%.
Bigger ≠ Better
Unique Challenges of Generative Inference
🡪 Requires more memory capacity
🡪 Requires more compute capacity
🡪 Requires both high memory bandwidth and high peak compute capability
All of the above contribute acutely to pain points w.r.t. cost, performance and power
2024 © d-Matrix
8
d-Matrix breaking through the barrier with In-Memory Compute
What
is doing about it
A New Computing Paradigm is Needed
2024 © d-Matrix
10
Compute
Accumulate
Accumulate
Accumulate
Multiply
Multiply
Multiply
Accumulate
Accumulate
Accumulate
Multiply
Multiply
Multiply
Memory
Digital-In-Memory-Compute
Memory
Memory
Memory
Multiply
Multiply
Multiply
Memory
Memory
Memory
Multiply
Multiply
Multiply
Accumulate
Accumulate
Accumulate
The A.I. Barrier
Traditional architecture
Low Bandwidth
High Bandwidth
architecture
Three Generations of Proven Silicon
Nighthawk
Jayhawk I
Jayhawk II
World’s first IMC | World’s first BoW chiplet | World’s first DIMC + chiplet |
Compiler + mapper | 2TB/s die-to-die bandwidth | 150 TOPS/W, 150TB/s SRAM BW |
Corsair: efficient GenAI inference
2024 © d-Matrix
12
Corsair Hardware
Aviator Software
Aviator Enterprise-Grade software �for easy and fast inference deployment
2024 © d-Matrix
13
Corsair Hardware
Aviator Software
Easily integrate Aviator with open ecosystem tools or your own deployment stack | |
convert model to enable Corsair numerics & sparsity | distribute workload across cards and servers |
compile and optimize model to run on Corsair | optimize inference runtime and model serving on Corsair |
orchestrate, manage and monitor Corsair cards and clusters | |
Current focus: Datacenter Inference
2024 © d-Matrix
14
Cloud
On-Premises
GenAI inference: Datacenter Scale
15
Solution varies based on customer’s datacenter infrastructure, including rack height & rack power density
Inference Server (4U or 5U)
PCIe Switch
Working with OEMs to build inference servers with d-Matrix PCIe cards
CPU
CPU
PCIe Switch
The d-Matrix Advantage
2024 © d-Matrix
16
Circuits & Numerics
Compute (DIMC)
Chiplets & Advanced Packaging
Software
Making Generative AI commercially viable
Significant benefits over GPU:
🡪 Better Throughput
🡪 Better Latency
🡪 Better TCO
d-Matrix breaking through the barrier with In-Memory Compute
Build With Us, Partner With Us
INTELLIGENCE DELIVEREDTM
www.d-matrix.ai
2024 © d-Matrix