1 of 11

PUNCHY - Not Tooooooooooo Long �Super Cool Title

Your Name, …, David Atienza

EPFL - Embedded Systems Laboratory

your.email@epfl.ch

2 of 11

Outline

  • Intro
    • Motivation
    • Background
    • Challenges
  • Topic 1
    • High throughput requirements
    • Scalability
  • Topic 2
    • Energy-efficient HPC
  • Results
  • Discussion

2

3 of 11

ESL in a nutshell

3

Embedded systems and computer engineering

    • Wearables and Internet-of-Things (IoT)
    • Low-power hardware-software co-design
    • System-level design methodologies for HPC

14 years at EPFL, 48 members today

    • 8 post-docs (17 so far, 10 moved to industry)
    • 30 PhD students (21 already graduated)

Innovation and tech. transfer: basic and applied research

    • 31 companies provided grants and donations

Intro

Topic 1

Topic 2

Results

Discussion

4 of 11

Motivation for sustainable acceleration

4

Context

    • SKAO will operate for 50+ years
    • Sustainability goals: SDP < 1 MWatt
    • Science goals: high throughput req.

Challenge

    • Plan and design energy efficient data centers
    • Software pipeline not fixed and evolving

Solution

    • HW-SW codesign
    • Heterogeneous computing
    • Reconfigurable accelerators

Intro

Topic 1

Topic 2

Results

Discussion

5 of 11

Motivation

5

    • Energy efficient ….

Knowledge gap

    • …..
    • ….

Proposal

Intro

Topic 1

Topic 2

Results

Discussion

6 of 11

Tradeoffs: Programmability vs Energy Efficiency

6

DSIP

GPP

ASIP

Energy

Efficiency

(MOPS/mW)

ASIC

FPGA

GPU

DSP

CPU

CGRAlike

1000

100

10

1

Flexibility (Programming)

Adapted from Kevin J M Martin. IPDPSW 2022

Near-fixed

Low-level programming

High-level programming

ASIP: Application Specific Instruction Processor

DSIP: Domain Specific Instruction-set Processor

GPP: General Purpose Processor

Intro

Topic 1

Topic 2

Results

Discussion

7 of 11

Quest for green acceleration: HW-SW codesign

7

3 paths HW-SW codesign, given starting point

    • Only specifications - open choice for a model
    • Existing software implementation
    • Existing hardware (chip – CPU, GPU)

System specification

Hardware functionality

Hardware architecture

Software requirements

Software Design

& Test

System integration

Integrated modelling

Incremental evaluation

Verification & validation

We only have pieces of the puzzle from each

Intro

Topic 1

Topic 2

Results

Discussion

8 of 11

Software: Imaging Pipeline Bottleneck

8

Convolutional Gridding

iFFT

Minor Cycle

50-500x iter

~10Gb/s

Apply Gains�Gain Subtraction

Deconvolution, FFT (CLEAN)

Major Cycle

8-30x iterations

~7Tb/s

~100Pflop

Visibilities �

Residual Image of the Sky

FFT

Degridding

Visibility domain

Image domain

Numbers based on MeerKAT

– precursor of SKA Mid, Karoo desert

Sky

model

Restore clean image

Intro

Topic 1

Topic 2

Results

Discussion

9 of 11

Why Reconfigurable Architectures?

9

Mark Papermaster: “Advancing EDA Through the Power of AI and High-performance Computing”, DAC59 Keynote, 2022

Hosted by a CPU

    • Design limited by the bus bandwidth!
      • carefully design memory & bus to avoid this bottleneck!

Yes, but ...

    • Optimized for a domain
      • Multiple kernels
      • Close to ASICs at the kernel level
    • Efficient execution of the kernels (latency and energy)

General purpose CPU

GPU

Domain specific

Applications

Performance / Watt

  • better than ASIC?

Intro

Topic 1

Topic 2

Results

Discussion

10 of 11

Next steps: open-source design frameworks

10

Maturity level (compile, explore, sim, validate)

C

SCALA-based

ADL*

Python, DSL, C/C++

Annotated C

C, OpenMP

Morpher ADL

CGRA-ME

Pillars

OpenCGRA

CCF

RIKEN

Morpher

+addons

(REVAMP)

*Architecture Description Language (ADL)

C

SCALA-based

ADL*

Python, DSL, C/C++

Annotated C

C, OpenMP

Morpher ADL

Intro

Topic 1

Topic 2

Results

Discussion

11 of 11

Your Name EPFL - Embedded Systems Laboratory�your.email@epfl.ch

Thank you!

11