GPU Programming Model
Dr A Sahu
Dept of Comp Sc & Engg.
IIT Guwahati
1
Outline
2
Graphics System
3
3D application
3D API: OpenGL
DirectX/3D
3D API Commands
CPU-GPU Boundary
GPU Command
& Data Stream
GPU
Command
Primitive
Assembly
Rastereisation
Interpolation
Raster
Operation
Frame Buffer
Programmable
Fragment
Processors
Programmable
Vertex
Processor
Vertex Index
Stream
Assembled polygon, line
& points
Pixel
Location
Stream
Pixel
Updates
Transformed
Fragments
Rastorized Pretransformed
Fragments
transformed
Vertices
Pretransformed
Vertices
Graphics System
4
Memory
System
Texture
Memory
Frame
Buffer
Vertex
Processing
Pixel
Processing
Vertices
(x,y,z)
Pixel
R, G,B
Vertex
Shadder
Pixel
Shadder
The Graphics Pipeline
Modeling �Transformations
Illumination
(Shading)
Viewing Transformation
(Perspective / Orthographic)
Clipping
Projection �(to Screen Space)
Scan Conversion�(Rasterization)
Visibility / Display
The Graphics Pipeline
Modeling �Transformations
Illumination
(Shading)
Viewing Transformation
(Perspective / Orthographic)
Clipping
Projection �(to Screen Space)
Scan Conversion�(Rasterization)
Visibility / Display
Programmable Graphics Hardware
Vertex
Shader
Pixel
Shader
Object space
Window space
Framebuffer
IN
OUT
Textures
GPU vs CPU
NVIDIA GeForce GTX 480
Generation IV: Radeon 9700/GeForce FX (2002)
Vertex
Transforms
Primitive
Assembly
Frame
Buffer
Raster
Operations
Rasterization
and
Interpolation
AGP
Programmable
Vertex shader
Programmable
Fragment
Processor
High-level shading language
Memory Hierarchy
Disk
CPU Main
Memory
GPU Video
Memory
CPU Caches
CPU Registers
GPU Caches
GPU Temporary
Registers
GPU Constant
Registers
GPU Memory Model
CPU Memory Model
GPU Memory Model
Vertex Buffer
Vertex
Processor
Rasterizer
Fragment
Processor
Texture
Frame
Buffer(s)
VS 3.0 GPUs
GPU Memory API
GPU Memory API
17
GPU Memory API
Vertex Buffers
Vertex Buffer
Vertex
Processor
Rasterizer
Fragment
Processor
Texture
Frame
Buffer(s)
VS 3.0 GPUs
Vertex Buffers
Vertex Buffers
Textures
Vertex Buffer
Vertex
Processor
Rasterizer
Fragment
Processor
Texture
Frame
Buffer(s)
VS 3.0 GPUs
Textures
Framebuffer
Vertex Buffer
Vertex
Processor
Rasterizer
Fragment
Processor
Texture
Frame
Buffer(s)
VS 3.0 GPUs
Programming Model: Early GPUs
DirectX and OpenGL
Programmability in GPUs
GPU Pipeline
Vertex
Shader
Fragment
Shader
Vertex
Data
Rasterize
To Pixels
Output
Shader Languages
Shader Unification
Unified Shader Pipeline�(DX10, OpenGL 2, OpenGL 3)
Vertex Programs
Geometry Programs
Pixel Programs
Compute Programs
Rasterization
Hidden Surface Removal
GPU
Programmable Unified Processors
GPU memory (DRAM)
Final Image
3D Geometric Primitives
Generalized GPU programming
OpenCL, DirectCompute
CS101 GPU Programming
33
DirectX11, OpenGL4
Modern GPU computing
Motivation: Computational Power
Motivation: Flexible and Precise
Problems: Difficult To Use
Programming a GPU for Graphics
Programming a GPU for GP Programs
Nvidia CUDA
GeForce 8800 Specs
Typical NVIDIA GPU Device Layout
Load/store
Global Memory
Thread Execution
Manager
Input Assembler
Host
Texture
Texture
Texture
Texture
Texture
Texture
Texture
Texture
Texture
Parallel Data�Cache
Parallel Data�Cache
Parallel Data�Cache
Parallel Data�Cache
Parallel Data�Cache
Parallel Data�Cache
Parallel Data�Cache
Parallel Data�Cache
Load/store
Load/store
Load/store
Load/store
Load/store
CUDA Execution Model
CUDA Execution Model
CUDA Memory Model
CUDA Memory Model
Grid
Global Memory
Block (0, 0)
Shared Memory
Thread (0, 0)
Registers
Thread (1, 0)
Registers
Block (1, 0)
Shared Memory
Thread (0, 0)
Registers
Thread (1, 0)
Registers
Host
How Do You Execute CUDA Kernel?
(From C/C++ function)
CUDA In Action: Matrix Multiplication
Matrix Multiplication On CUDA
Matrix Multiplication On CUDA
M
N
P
WIDTH
WIDTH
WIDTH
WIDTH
Matrix Multiplication On CUDA Code
Limitations With this type of attempt
CUDA Blocks
Blocks Diagram
Md
Nd
Pd
Pdsub
TILE_WIDTH
WIDTH
WIDTH
bx
tx
0
1
TILE_WIDTH-1
2
0
1
2
by
ty
2
1
0
TILE_WIDTH-1
2
1
0
TILE_WIDTHE
WIDTH
WIDTH
Matrix Multiplication Using Blocks
Matrix Multiplication Speed Analysis
How GPU Executes Code
GPU Constraints – Memory Speed
GPU Constraints – Memory Size
GPU Constraints – Thread Count
Larrabee
Thanks
63