CUDA Ray Tracer
Abhi S.
Noah K.
Jared G.
What is CUDA?
Ray Tracing
Ray Tracing
miss!
1
2
3
[ Camera ]
Comparison
Serial
CUDA
Performance with Varying Sphere Count
Performance with Varying Sphere Count
CUDA Memory Management
CUDA offers a "Unified memory" abstraction that syncs device and host address space.
Our problem was simple enough that we could manage device memory manually.
Steps:
For each frame:
CUDA Abstraction
Threads < Thread-Blocks < Grid
CUDA Architecture
[Global L2 cache]
SM 1
[L1 cache]
SM 2
[L1 cache]
SM 3
[L1 cache]
SM 4
[L1 cache]
CUDA Architecture
In hardware, SMs are implemented with warps.
[SM 1]
W0
W2
W3
W1
T0
T1
T2
T3
...
T0
T1
T2
T3
...
T0
T1
T2
T3
...
T0
T1
T2
T3
A warp is a set of 32 threads within a block.
- All threads in a warp execute the same instruction.
- Context switches are done for whole warps.
- Threads in a warp that branch differently become serial.
(thread-divergent branches)
...
...
Thread-Divergent Branches
if (threadIdx.x % 2)
a();
else
b();
c();
a and b won't run in parallel on threads in the same warp.
Since the threads have diverged, c effectively runs in half the speed!!
Thread-divergent branches are bad for parallelism.
CUDA Kernel Configuration
miss!
1
2
3
[ Camera ]
CUDA Kernel Configuration
CUDA Kernel Configuration
Observation:
The orange square will have same runtime of 0 bounces.
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
CUDA Kernel Configuration
Observation:
Pixels within a square will generally have a similar # of bounces.
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | v | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
CUDA Kernel Configuration
Observation: Nearby pixels will
Warps are assigned 32 threads,
So we make blocks of 8x8 chunks of pixels.
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | v | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
dim3 blocks(width / 8, height / 8);
dim3 threads(8, 8);
raytraceKernel<<<blocks, threads>>>(...);
Scatter
No scatter
Too perfect
With scatter
Too noisy
Accumulation
With Accumulation
Just right :)
Frame 1
Frame 2
Frame 3
+
+
+
Frame ...
No Accumulation
Too noisy
Ray Tracing
Collisions and Light Contribution
Light On intersection the ray takes a dot product between the color to calculate the light contribution
Specular after the intersect a new ray is cast from the direction reflected from the normal
Scattering a random uniformly distributed vector is added to the the subsequent ray direction according to the roughness
Intersect take the dot product of the difference between the ray origin and the sphere center. With the radius use pythagorean and subtract and add.