Triton for MTIA
Roman Levenstein, Shintaro Iwasaki, Ilia Cherniavskii
Presentation outline
What is MTIA?
Meta Training and Inference Accelerator (MTIA)
More information:
MTIA high-level HW architecture
High-level architecture of the accelerator
PE’s internal organization
MTIA Programming
Motivation for Triton for MTIA
Why Triton for MTIA?
Triton for MTIA feasibility study
Proving feasibility by developing a working prototype to answer main open questions
Prototype Overview: Software Diagram
Triton Code
Triton-MLIR�(MLIR Dialect)
TritonGPU-MLIR�(MLIR Dialect)
TritonMTIA-MLIR�(MLIR Dialect)
MTIA-MLIR
RISC-V Binary for MTIA
LLVM-MLIR / LLVM-IR
Triton for MTIA�Prototype
C/C++
Triton for CUDA
Triton for MTIA Prod
Triton for MTIA Prototype
Triton
MTIA Clang/LLVM
LLVM-MLIR / LLVM-IR
PTXAS/CUBIN
Note: Triton DSL -> MTIA C++ lowering is done for a quick prototype purposes only; The proper Triton for MTIA implementation will be MLIR based
Prototype Functionality Coverage
Prototype performance
Execution Time. Lower is better.
Next steps
Productization of Triton for MTIA
Triton Code
Triton-MLIR�(MLIR Dialect)
TritonGPU-MLIR�(MLIR Dialect)
TritonMTIA-MLIR�(MLIR Dialect)
MTIA-MLIR
RISC-V Binary for MTIA
LLVM-MLIR / LLVM-IR
FILTER-IR
KNYFE Code (BLOCK)
KNYFE DSL
Triton
MTIA Clang/LLVM
LLVM-MLIR / LLVM-IR
PTXAS/CUBIN
Triton for CUDA
Triton for MTIA
KNYFE (DSL for MTIA)
Shared between Triton and KNYFE
Open questions about Triton support for heterogeneous HW
Need for improved support for memory subsystems
Need for improved support for asynchronous execution
Need for supporting cross-PE primitives and PE topology-aware codegen for custom HW targets
Conclusion