1 of 18

Transforming Generative AI �from Unsustainable to Attainable��Sree Ganesan�VP Product, d-matrix.ai

2024 © d-Matrix

1

2 of 18

The Exploding World of Generative AI

2024 © d-Matrix

2

Microsoft

Inf/day

1

Trillion

3 of 18

But… Demand WAY Outstrips Supply

2024 © d-Matrix

3

4 of 18

d-Matrix breaking through the barrier with In-Memory Compute

Why?

5 of 18

Cost

2024 © d-Matrix

5

$100,000,000,000 in CAPEX alone to deploy ChatGPT or Bard into every Google Search

28,936 Nvidia GPUs + $700,000/day for OpenAI to run ChatGPT-3

When asked if training GPT-4 cost $100,000,000, Altman replied, “It’s more than that.”

6 of 18

Power

2024 © d-Matrix

6

Without sustainable practices, AI will consume more energy than the human workforce by 2025.

The energy needed to power AI could account for up to 3.5% of global electricity consumption by 2030 if current practices remain unchanged.

The Generative AI Race Has a Dirty Secret

Integrating LLMs into search engines could mean a fivefold increase in computing power

and huge carbon emissions.

7 of 18

Size

2024 © d-Matrix

7

“I think we're at the end of the era where it's going to be these, like, giant, giant models. We'll make them better in other ways.”

- Sam Altman, OpenAI CEO

April 2023

Source: CSET, Georgetown University, 2022

Note: The blue line represents growing costs assuming compute per dollar doubles every four years, with error shading representing no change in compute costs or a doubling time as fast as every two years. The red line represents expected GDP at a growth of 3% per year from 2019 levels with error shading representing growth between 2% and 5%.

Bigger ≠ Better

8 of 18

Unique Challenges of Generative Inference

Models are large (billions of parameters) and context lengths are growing (upto 128K)

🡪 Requires more memory capacity

If they are too large to fit on a single device, need to parallelize across multiple devices

🡪 Requires more compute capacity

While prompt processing is compute bound, token generation is memory bound. With faster memory bandwidth you get faster token generation.

🡪 Requires both high memory bandwidth and high peak compute capability

All of the above contribute acutely to pain points w.r.t. cost, performance and power

2024 © d-Matrix

8

9 of 18

d-Matrix breaking through the barrier with In-Memory Compute

What

is doing about it

10 of 18

A New Computing Paradigm is Needed

2024 © d-Matrix

10

Compute

Accumulate

Multiply

Accumulate

Multiply

Memory

Digital-In-Memory-Compute

Memory

Multiply

Memory

Multiply

Accumulate

The A.I. Barrier

Traditional architecture

Low Bandwidth

High Bandwidth

architecture

11 of 18

Three Generations of Proven Silicon

Nighthawk

Jayhawk I

Jayhawk II

World’s first IMC	World’s first BoW chiplet	World’s first DIMC + chiplet
Compiler + mapper	2TB/s die-to-die bandwidth	150 TOPS/W, 150TB/s SRAM BW

12 of 18

Corsair: efficient GenAI inference

2024 © d-Matrix

12

Corsair Hardware

Aviator Software

13 of 18

Aviator Enterprise-Grade software �for easy and fast inference deployment

13

Corsair Hardware

Aviator Software

Easily integrate Aviator with open ecosystem tools or your own deployment stack
convert model to enable Corsair numerics & sparsity	distribute workload across cards and servers
compile and optimize model to run on Corsair	optimize inference runtime and model serving on Corsair
orchestrate, manage and monitor Corsair cards and clusters

14 of 18

Current focus: Datacenter Inference

14

Cloud

On-Premises

15 of 18

GenAI inference: Datacenter Scale

15

Solution varies based on customer’s datacenter infrastructure, including rack height & rack power density

Inference Server (4U or 5U)

PCIe Switch

Working with OEMs to build inference servers with d-Matrix PCIe cards

CPU

PCIe Switch

16 of 18

The d-Matrix Advantage

16

Circuits & Numerics

Digital In-Memory

Compute (DIMC)

Block Float Sparsity
Compression

Chiplets & Advanced Packaging

2D, 3D Stacking
Logic, Memory Co-package

Software

Easy to Use
Performant, Scalable

Making Generative AI commercially viable

Significant benefits over GPU:

🡪 Better Throughput

🡪 Better Latency

🡪 Better TCO

17 of 18

d-Matrix breaking through the barrier with In-Memory Compute

Build With Us, Partner With Us

18 of 18

INTELLIGENCE DELIVERED^TM

www.d-matrix.ai