Notes From Rendering Textbook

Rendering basics

The System

This is a general description of how things get rendered. The following are stages a program goes through.

1. Application, a program running on your CPU.
2. Geometry, what, how, and where things are drawn. Projection and transformations occur.
3. Rasterizer, creates a color image, per pixel computation occurs here.

During the application stage, Rendering Primitives are created and sent to the Geometry stage.

Rendering Primitives are points, lines, and triangles.

Geometry Stage (Rendering Pipeline)

1. Model and View transformations
4. Projection
5. Clipping
6. Screen mapping

Model and View transformations

• Model space is the point of view from a single 3D model.
• World space is the point of view from the entire game/world
• View space is what the camera can see, what ultimately will be rendered.
• Models are transformed from model space to world space.
• The world is transformed to view space.

Although we think of the camera moving around the world, the math works out better if we move THE ENTIRE WORLD and leave the camera at the center.

• Process model material and prepare for shading
• For each vertex, calculate some data and assign it to that vertex.
• This data will be interpolated over later as the vertices make triangles.
• Often, the model, view, world transformations are done here, and the projection(s).

Covered in detail later.

This stage may change the number of primitives produced, creating new triangles or deleting existing ones.

Projection

• Transforms view volume into unit cube.
• View Volume is the entire volume of space visible to the camera.
• Think of your field of vision and how far away
• Everything the camera can see is represented as a 3D object and this us called thg view volume
• Unit Cube is a 3D cube with center at the origin 0, and width, height, depth of 2.
• It's like a unit circle, but a cube.
• The view volume is squashed and stretched to become a cube.

There are a couple common view volumes, and when they are stretched into a unit cube they perform specific kinds of projection.

A projection is a mathematical term meaning a 3D space (our world) becomes 2D (for our computer screen).

Orthographic projections happen when the view volume is a rectangle or cube. Things in the distance appear just as big as things in the foreground, which means no sense of depth is rendered. This is good for 2D games!

Perspective projections happen when the view volume is a frustum (a pyramid with the pointed top chopped off). Depth is rendered, things in the distance appear smaller than things in the foreground, just like real life!

Clipping

Everything outside of the unit cube is thrown out.

If something is partly in the unit cube, it is cut along the plane(s) it's intersecting to make new triangles so that -everything- is in the cube.

Sectioning is the process of cutting up a primitive to make it fit in the cube.

Screen Mapping

The unit cube is scaled so that it fits the screen. The x and y coordinates of all the vertices in the unit cube are mapped to be between 0 and length/height respectively. The z coordinate remains between -1 to 1 and is passed along.

Rasterizer Stage

Finally the image is rendered given all the 3D data and transformations from above. We're working with the resulting unit cube and the 3D objects in it.

1. Triangle Setup
2. Triangle Traversal (Scan Conversion)
4. Merging

Triangle Setup

Differentials for the triangle are calculated (no further information is given, I don't know what this means), and other data is computed for the triangle like for interpolation.

Triangle Traversal

Also called Scan Conversion.

For each pixel with its center (aka its sample) located in a triangle, a fragment is generated. Data from the triangle from the geometry stage and interpolated data  is saved with that fragment.

This is a custom program to modify colors of the fragment and change what the ultimate color the 3D object will have.

Per-pixel data is computed here. You can do cool things like texturing and even cause effects based on where in the screen the pixel is!

Merging

Each fragment writes out some color data to a color buffer and its z value to a z buffer and stencil value to a stencil buffer. Which color gets saved to the color buffer is determined through this stage. The most common technique is to compare depth values.

If the current fragment color being written is from a triangle far away, and a fragment was already written to this pixel location from a closer triangle, the color is not saved. This results in closer objects being drawn on top of distant objects.

This stage is very configurable. You can change how the depth test works (maybe farther stuff gets drawn on top, or the depth test is ignored), if and how a stencil test works, or maybe the color being written should be mixed with whatever color is already there!

Blend Operations

Also called Raster Operations. These are tools and techniques to modify how colors are written to the color buffer.

Alpha (transparency) data can be saved and tested on too. A common technique is to remove fragments that are fully transparent so they don't overwrite the z buffer.

A stencil buffer can be written to as well, then read and tested against to see if colors should be written to the color buffer. You could draw a circle to the stencil buffer and then set a test so that only pixels inside that circle are rendered.

Frame Buffer

This can either refer to the color and z buffers as a single package, or refer to a buffer of images to do effects like motion blur and anti aliasing.

The GPU

This section covers the configurable and programmable parts of graphics rendering. It covers some of the same topics from the last section, but is a little more technical.

The Configurable Graphics Pipeline

In software and hardware design, a pipeline is a specific kind of system where data flows from one module to the next, where each module performs some tasks and modifies the data. Some also say the data is filtered as it goes along and these filters can be swapped out without changing the entire system.

3. Clipping+
4. Screen mapping*
5. Triangle setup*
6. Triangle Traversal*
8. Merger+

+ means the stage is configurable.

* means the stage cannot be modified at all

The rest of the stages are fully programmable.

Each single draw call will cause this entire pipeline to run once for all the primitives passed into the draw call.

The Processor Itself

Vendors and chips and cards

High end GPUs are generally made by 2 companies - NVIDIA and AMD.

The graphics card is honestly like a little computer. Many companies use the chips designed and built by these makers to build the physical card itself. This is why there are mostly 2 brands/types of cards but dozens of brands to buy from (Radeon, Asus, etc.)

Abstracting the Hardware

The GPU can only understand machine code, which is defined by the chip designers (Nvidia and AMD), and they both have different machine languages.

The problem is graphics programmers need to write code for both of these chipsets, and writing assembly (or especially machine code) is incredibly impractical.

The solution is to use a software design called abstraction and virtual machines.

Abstraction means we are generalizing how the chipset works and using a higher level language to write code. The tradeoff is we no longer have direct access to the processor and lose the potential for extreme optimization, but we have a much easier time writing code.

A virtual machine means we are representing the chipset as a simplified, generalized processor instead of worrying about the exact details of the specific processor and card.

A graphics driver translates the higher level language into machine code instructions for the GPU.

OpenGL vs DirectX vs Vulkan

These are all higher level languages, the virtual machines we can write code to.

Some may offer certain tools the others dont have, or certain optimizations, but at the end of the day it's like Java versus C#. They both offer the same basics and advanced tools, just under different wordage.

There's tons of fascinating history between OpenGL and DirectX that I want to read about.

Vulkan is the newest language and it offers a lot of specific kinds of optimizations and appears to be the future of graphics programming.

What is a software binding?

You may come across this term when looking at tutorials or software. An OLD piece of software called GLUT advertises itself as an OpenGL binding for c++.

A software binding means a tool that allows a software language (c++, Java, C#) to use a graphics language (OpenGL, DirectX).

Graphics Programming Basics

These apply to all graphics languages.

There are a set of coding primitives that the GPU understands.

• Registers contain 4 independent 32 bit single precision floats.
• 32 bit integers
• Aggregate types (structures, arrays, matricies)

Inputs

Programmable shaders have flexible inputs that come in a few varieties.

• Uniform - The value remains the same over the entire draw call.
• Varying - Inputs that change per pixel or per vertex.
• Texture - A special Uniform input. Used to be just for images, but are now used as any collection if data.

Remember these are inputs to the GPU. Programs you write can use a more standard set of types - float, int, matrix, and others. This is simply how to send additional data to the rendering pipeline during a draw call.

Outputs

Outputs are very constrained and have fewer options. (The book really cut it short here…)

Note on Performance and Optimization

A big part of graphics programming is trading off render speed for quality, but as a student I want to emphasize the need to focus on function over optimization. Get your demos working and running at all, then look at optimizations. Don't do complex optimizations right away.

Flow Control

Flow control means using if, else, and switch statements to create branching logic in a program.

CPUs are amazing this, GPUs are not so good at it because they operate on multiple data at the same time, so they can't easily manipulate some data and not others.

The exception is static control, which means a uniform value is used as the conditional so the program will know which path to take for all data for a single draw call.

Dynamic control uses varying inputs for the conditional so the program could go either way for each pixel or vertex. The GPU handles this by calculating the results for both paths, then throwing one result out depending on the condition.

Compilation

Shaders can be compiled ahead of time or when the program runs. The code is stored as a string.

In my experience with GLUT and SFML, the input is a GLSL file to the library/binding, and if it compiles correctly then the program is stored on the GPU and you get an ID to identify it. You can compile multiple shaders and swap which shader program to use with the IDs.

Ins and Outs of the programs

The first programmable step is the vertex shader.

Before the vertex shader, though, input is assembled. This means input vertex position data is paired with any other vertex data, like a vertex having a color or UV position, or both! The utility is being able to swap out color or UV data but use the same mesh.

Instancing is when the same input mesh is drawn multiple times in a single draw call, but some other associated input data may vary per instance.

The vertex shader only operates on vertices and any associated data with the vertex. It does not create or modify the triangles or the mesh. It focuses on changing colors, texture coordinates, normals, and ultimately returning its final clip space position (implying any projection techniques happen here too).

Input vertices are treated independently and have no data on the other vertices in the draw call.

Common effects using just the vertex shader

• Scale, rotate, translate, shear a mesh
• Fish eye lens
• Vertex blending for animations
• Silhouette rendering
• Procedural deformation
• Cloth, flags, water, etc.
• Heat haze, water ripples
• Can be performed by storing a screen image into the frame buffer, then using that buffer as a texture on top of a deformed mesh.
• Vertex Texture Fetch - using texture data to manipulate the mesh.
• Like having a ground geometry data stored as a texture to determine the height of the ground.

As input, the geometry shader receives one of the 3 primatives - a point, line, or triangle. Other primitives can be defined and then used (apparently). Additionally, the adjacent points, lines, or triangles can be made available to the program.

The program outputs points, polylines, and triangle strips. You can also output nothing, causing the primitive to be deleted. There is a rough upper limit of creating 1,000 primitives per invocation, so don't use this for tessellations, but a few copies is okay.

The shader defines which type of input it wants and which type of output it will result in. They don't have to match, so a mesh defined using triangles can output points or lines.

You can even modify vertex data here. Similar to the vertex shader though, the output positions if each vertex is in clip space.

Stream Output is a new feature where data from the Geometry stage is sent back to the application as an array of data. Normally the data goes straight to the Rasterization stage. Rasterizarion can be turned off if wanted! This turns the GPU into a general data processor that specializes in SIMD operations!

Common effects using the Geometry shader

• Creating various sizes particles from point data
• Extruding fins (planes with a texture that intersect the main model mesh) along a silhouette for fur
• Meatballs (think lava lamps where blobs intersect and blob with each other)
• Fabric deformation

This program operates on one screen pixel at a time. When a triangle mostly covers a pixel, a fragment is generated and this program processes it. The program returns a color which will be stored and merged into the color buffer during the merging process.

Input data comes directly from the vertex or geometry shader. 16 or 32 vectors may be used as inputs respectively. Additional inputs like the screen position of the pixel and whether the triangle’s front or back is facing are available.

Similar to the vertex shader, the program does not know about any other pixels. Th10e exception is for differential functions (also referred to as gradients, these can calculate how a pixel changes along either the x or y direction). This is useful for filtering and edge detection.

The depth value (generated in the rasterization stage) can be modified here. The value in the stencil buffer (also from the rasterization stage) can only be read from.

You can generate no output, but that forces poor optimization from the GPU (hide the mesh other ways if possible).

Multiple Render Targets (MRTs)

The fragment shader can output to multiple buffers, not just the one color buffer meant for the display, in one execution!

A very efficient technique where we would otherwise need to run multiple passes.

Merging

Again, purput colors are combined with whatever is on the frame buffer at the moment. Either colors are overwritten or they can be multiplied, added, subtracted, bitwise… there are so many options.

If MRT is being used, then the blending applies to all the buffers too. Though there are new options to allow different render targets to have different blend options.

Materials (Effects)

Taking a step back to look at a GPU pipeline with specific vertex, geometry, and fragment shaders and merge settings and maybe even multiple passes. The entire setup is usually grouped together as a single Material. It can also be called an effect. A single, specific visual effect the configured system makes using a defined lighting equation with specific expected/allowed inputs.

Tools exist that encapsulate this idea of an effect where a single file contains all the instructions and programs for the GPU.

Unity’s shader language and shader files operate very much like this. A shader file defines passes, vertex and fragment programs, input and output data, and so on. Then, in the editor, you can create a material that uses a customer Unity shader. In this sense, a material instance is a use of a shader with assigned, specific data provided.

For example, unity's default material uses a default lighting shader. Each unique model will have its own unique material assigned to it that will provide the color and texture data for the model.

What is a rendering engine?

Setting up an OpenGL binding and window with a draw loop is a pain in the arm (albeit an educational one off pain). An engine does that for us and simplifies writing code.

Unity is one engine (and it runs games too!), but so are things like SFML.

Depending on the engine, you'll like be writing Effect files which contain some pure shader code for the vertex, geometry, or fragment shader. The rest of the effect file will be engine specific code telling how to use the effect.

This is why I find it best to study general GPU theory than a specific engine because in order to understand and use the engine you'll need to know the system.

What are Semantics?

You'll often see Effect code like

Struct appdata {

float3 position : POSITION

float3 normal   : NORMAL

}

Semantics are those all caps words after the colon. They signify where the data comes from or where the data is going. Like our struct appdata here is storing our input mesh position and normal vertex data.

These are assigned during the input pairing stage. This is how the GPU knows to assign the data in this struct these values.

Semantics is one of the few BIG changes between DirectX and OpenGL. OpenGL does not use semantics at all because OpenGL compiles the vertex, geometry, and fragment shaders into a single program (so all inputs and outputs are already known and where they go). DirectX compiles them separately so it's important to define where the inputs and outputs are coming and going.

The names of the semantics and what data they provide are totally dependent on the engine you're using. DirectX provides a set of them on their own and a good engine will include then build off these. So simply pray to whatever goddess you hold dear that the information is well documented. Or best bet is to find examples and start building a public library of semantics.

Basic 3D Terminology

This is a generalized, quick section that introduces words you will see in the field, but explained simply and without much detail as to how and why.

Let's have fun and explore making things, then, get into detail about subjects you wanna explore!

Math

This is a math heavy section and the ideas are covered more in detail in my linear algebra notes and quaternion notes.

I'm also tired of getting bogged down in trying to fully understand the math at play when really all you need to know is the definitions (so you can understand ish what is being conveyed by technical write ups) and a very general why the things are used.

• Transformation - Any operation that modifies a point, vector, or position.
• Moving a 3D model is a transformation.
• So is rotation, scaling, projections, etc.
• Affine Transformation - A transformation where the object looks mostly the same. Lines aren't bent, proportions remain equal.
• Moving (translation), scaling, rotation.
• Linear Transformation - the transformation of some bigger object will result in the same thing if you independently transform the smaller objects.
•  Imagine a 3D model character with a head and body. We can either scale the entire model, or scale the body and head separately. Scaling is a linear transformation because we get the same result if we scale the entire model compared to scaling the head and body separately.

Linear Algebra

• Linear Algebra - A section of math focused on linear transformations and matrices.
• Mapping a Vector (aka Mapping)- when a vector/vertex is multiplied by a matrix. The result is another vector where the effect on the vertex depends on the values in the matrix (rotation, scaling, projection, etc.)
• Matrices with values in specific entries and 0s in others can be created that result in a single specific transformation effect like rotating or translating (moving).
• A more complicated matrix can be formed to do multiple transformations at once.
• Matrix multiplication -  because of the linear properties of the transformations we use in 3D, we can multiply matrices together to create a combined matrix that does multiple transformations in one mapping computation!
• It's beneficial to compute a combined matrix CPU side first then use that as a uniform in the vertex shader because all vertices will want to be mapped the same way.

Spaces

• Space - refers to a particular 3D coordinate system with an origin at a specific point and a specific scale for units.
• Model/Object Space - The space where a 3D model is created.
• For example, you could model a character in Blender where the origin of the model is at (0,0) and stands up to 15 Units tall. But how big is this in Unity? Unreal? Another engine? Where does the model stand in the game? The model space only knows about the model itself and relative distances around the model based on the modeling software.
• World/Game Space - The space where many objects/models and the game exists.
• Projections - A transformation that takes objects of 3 dimensions and puts it into 2 dimensions. A 3D model to a 2D screen.
• The dimensions can by any number, not just 3 to 2.
• Orthographic Projection - A very simple projection where the depth values are simply ignored. Objects are not smaller in the distance than objects closer to the camera.
• This is a common use case for 2D games.
• Because of the Z buffer still being written to, objects will still render on top of each other using the assigned depth values!
• Any sense of depth is made artificially by creating smaller sprites or scaling things down.
• Perspective Projection - Simulates human vision by making distant objects appear smaller and nearer objects bigger.
• Here, the depth value is used to scale objects. The farther away the smaller they become (and sometimes scale up when closer).
• Sense of depth is computed by the GPU automatically (during the vertex shader) by multiplying each vertex by a projection matrix.
• The math for this isn't terribly complex and is extremely interesting! I recommend finding a tutorial on it!

Modeling

General Modeling Terminology

• Modeling - Building a 3D object made of vertices.
• UV - A 2D coordinate system where 3D vertices are mapped or assigned to so that a 2D image can be placed onto the model.
• Texturing/Unwrapping - Assigning vertices a UV value and declaring where edges are so the model can be given a 2D texture to apply to its faces in a logical manner that helps us artists.
• This is a fascinating and tricky field to get right and there are many techniques!
• Technically referred to as Mesh Paramterization
• Sculpting - A 3D model tool or program where instead of manually moving and creating vertices, you blob, push, cut, chisel, and mold a model much like you would with clay in reality.
• It's a fun tool and is especially good for organic creations (creatures, people, flora, etc.)
• Sculpts are not often suited for practical use because they often result in very high polygon counts which can't be rendered quickly by games and films. The workaround is to take a sculpted model and transform it into a lower polygon model, then create HD textures using the high poly model to simulate extra grooves and ridges.
• Retopology (retopo) - The process of converting a high polygon model to a lower polygon model.

Animations

• Bone - A piece of information about a model that allows for itself to be transformed and in turn will transform some number of vertices of a model.
• Rigging - attaching bones to a 3D model. Attaching meaning the bones cause the assigned vertices to move.
• Vertex Weight - Allowing vertices to be assigned to multiple bones and giving each vertex a weight value per bone. Higher (at most 1.0) value means the bone's transforms modify that vertex more, and lower values means the bone only modifies the vertex a little bit. 0 means no transformation by the bone.
• Morphing or Blend Shapes - After defining a resting/neutral shape, you can manually move vertices in the model to precise positions and then assign the changes to a shape key. You can create multiple poses. Then, when animating, you can tell the model to use certain amounts of these shapes to create specific looks.
• It's a nice alternative for models or sections of a model with high vertex counts where bones might be too broad or imprecise for specific looks. Like mouths and eyes.

Graphics

• Shading - When a fragment is processed, it determines which color to output based on which angle the camera is viewing it, and any light sources.
• Lambert and Phong are common, simple shading algorithms, but they produce that common “ugly” default plastic 3D look.
• When light hits an object, how bright the object is depends on how much the face’s normal is pointing in the opposite direction of the light.
• A specular effect (white/bright spot) is added using the same idea.

Anti-Aliasing

• Alias (artifact) - A rendering error where smooth objects or text appear grainy, rough, and jagged.
• This results from trying to convert rounded lines to pixels, which are squares.
• When a mesh only partially covers a pixel, a fragment is still likely generated for that pixel and so the entire pixel will be filled as though the mesh covered the pixel entirely.
• This also occurs when sampling textures, since UV positions may not always fall exactly on pixel boundaries in the texture.
• Anti-Aliasing (AA) are techniques to fix aliasing.
• Multi Sample AA (MSAA) - A technique where instead of sampling a texture at just exactly one specific point, nearby pixels are also sampled and are weighted together to result in an average color.
• There are tons of different ways to choose which samples to use and what weight to give them.
• Generally the more samples used the better the results, at the cost of increased processing demand.

Textures

• Textures are images that can be used in some manner when rendering a mesh. Most obviously for color - applying a color image to a mesh.
• Bumpmap - a texture that modifies normal values of a mesh, causing surfaces to appear bumpy where specified.
• Parallax or Relief mapping - will cause some parts of the mesh to render on top of others even though the mesh normally wouldn't. This can be used to enhance a brick texture so that closer bricks will actually render on top and hide parts of further bricks, though the geometry may be flat.
• Displacement mapping - vertices are moved (and created) to actually displace a mesh.
• Texels are the pixels of a texture.

Texture Pipeline

There's a common flow whenever a texture is applied to a mesh:

1. Given we know where in model (sometimes world) space this current fragment/pixel being rendered is on the mesh...
2. Projector stage - transform that position to a UV value. This is usually done through interpolation since each vertex of a mesh often has a provided UV value.
3. Corresponder stage - Convert the UV value to a texel location.
4. Obtain value stage - Actually grab the value from the texture.
5. Modify and use the returned value.

Projector Stage

Typically when building a 3D mesh, you will often set specific UV values, however you don't need to always provide them and you could use an algorithm to generate them in one of the programmable shaders.

Examples of algorithmic projector stages are environmental mapping and tri planar mapping.

Corresponder Stage

Here the UV values are transformed, modified, and prepared for fetching data from the texture. Common tools like stretching and rotating the texture can happen here.

Any shading languages use the range [0, 1) for UV values, where (0, 0) represents the origin (usually the bottom left corner) and (0.999999…, 0.99999…) represents the top right corner.

What happens to values beyond this range can be toggled in one of the programmable stages. UV values can be…

• Clamped - anything below 0 will be set to 0, and above 1 will be set to 1.
• The pixel on the border will be used for any UV values outside the range, causing the texture to “bleed”.
• Wrapped - as though the UV values were run through a module function, meaning 1.3 becomes 0.3 and -0.5 becomes 0.5
• This produces a tiling effect.
• Mirrored - The texture will mirror itself at the integer boundaries beyond (0, 1]
• Border - UV values outside the range will return a separately defined border color instead.
• This effect is similar to Clamped, but instead the texture doesn't bleed.

Mirroring textures is a standard practice to fill environments, but can lead to situations where the tile becomes obvious and it kills the immersion or feel of the scene.

One solution is to combine multiple textures to create hundreds of unique possibilities. Or to use a set of tiles that all share the same border so they can be used interchangeably.

Obtain Value Stage

This stage seems straightforward - convert a UV value to a texel and return it, but there is some nuance.

Let's say we're applying a 256x256 texture onto a square. As long as the square is projected to roughly 256x256 screen pixels big, everything is great! We get problems when the square becomes smaller (minification) or bigger (magnification) than 256 screen pixels big. The texture will get distorted, so how to do get around that?

Magnification solutions

The easiest solution is just to grab the nearest texel, but this can result in a lot of aliasing. Some more refined approaches are bilinear interpolation and cubic interpolation, where an array of nearby neighboring texels are fetched and weighted to calculate an average color. These methods are more effective at the cost of more GPU work.

Another fix is to apply high res detail textures on top of the magnified texture. Or using vector graphics (SVG files) which can be magnified losslessly.

Minification solutions

Nearest Neighbor can be used again, but will result in worse artifacting because so many texels may influence a single pixel.

Temporal Aliasing refers to aliasing getting worse as the object moves further from the camera.

Biliner interpolation is better but still causes aliasing. This problem is harder to solve because it would require sampling tons of texels per pixel. A proper solution is to compute some modified textures beforehand using some more costly algorithms, then swap in these minified textures.

Mipmaps are simply a list of minified textures. Level 0 (the original, unmodified texture) is scaled down using whatever minification algorithm, to half its size. This repeats all the way down to a 1x1 pixel image. These images are created outside of the GPU before rendering ideally.

Deciding which submap (a single mipmap level) to use is calculated at runtime. The process is called Level Of Detail (LOD) or lambda. You calculate 4 values - how the U and V values change as the screen’s X and Y values change, then take the absolute biggest value. These calculations are provided to us as gradients in shader programs.

The LOD value is a number corresponding to a mipmap level. 0 being the lowest and represents no minification. Fractional values may be used and it means the returned color value from the texture is linearly interpolated between 2 different mipmap levels.

The flaw of mipmaps is it will favor the smaller, blurrier image when presented with wildly varying gradient values. Imagine a camera sitting on a long table. The table pixels close up will want a low LOD value, but table pixels far from the camera will want a higher LOD value. Even though mipmapping is sampled per pixel, up close values will still pay extra attention to the high rate of change of texel values as the screen y increases, resulting in a low res mipmap for close pixels.,

A deviation of mipmaps is the Ripmap - textures are also minified down as rectangles, so that it can better handle heavily skewed sampling like the problem above describes. But this still doesn't cover all problems and costs quite a large amount of memory to store all the textures.

An alternate algorithm to mipmaps is Summed Area Tables (SAT). We calculate and store an average pixel color at each texture pixel, then when a screen pixel is mapped on to the texture, a fairly accurate average color for that exact area can be returned quickly! This algorithm still fails when the texture is very skewed and rotated, resulting in a rotated rectangle being mapped to the texture, which cannot handle rotations in this calculation, again resulting in blurring.

Both Ripmaps and SATs are types of Anisotropic Filtering Algorithms - algorithms that retrieve texel values over areas that are not square.

However, in the infinite wisdom that is graphics rendering jargon, there is one highly used algorithm today called…

Anisotropic Filtering - it uses the same algorithm as mipmaps, but takes multiple samples instead of just one at the center.

1. The screen pixel is projected onto the texture, meaning a rectangle of area covering some number of texels that need to be minified. This rectangle may not be axis-aligned with the texture.
2. The d/LOD/lambda value is calculated using the smallest side of this rectangle.
3. Depending on the ratio between the shorter and longer sides of the rectangle, some number of samples are taken from the mipmap. A 2:1 ratio means 2 samples, for example. There are many different ways to determine how many samples and where to take and weight them along the anisotropic axis.
4. Anisotropic Axis - The samples are taken along the middle of the rectangle's shorter side, along an axis running parallel to the longer side.

This process produces better results than just mipmaps and doesn't require any additional texture memory space!

Specialty Textures

Volume Textures

These textures are made up of some number of 2D slices of textures. Some medical imaging tools produce 3D volume textures like this.

They certainly require a lot of memory, so are really only useful for specific situations that would benefit from a volume texture, like volumetric lights, a carved marble or wood statue model.

If a render has issues with seams, volume textures can help since parameterization is very straightforward since we're not projecting down to 2D.

Cubemaps

6 2D textures are stitched together to form a cube, where the textures face inward toward the center of the cube. Ideally, the textures all seamlessly flow at the edges so the faces of the cube are hard to notice.

The cube map is sampled very differently. A 3D unit vector is used to fetch a value. The vector starts at the center, inside the cube and points outward to one of the “walls” of the cube. Simply imagine a vector representing the angle someone is looking at from inside the cube. I have a thorough write up on using cube maps here.

Cubemaps can use mipmaps and interpolation, but hardware cannot interpolate between cubemaps faces, which can make the seams more obvious. You can still interpolate manually by writing our own shaders.

Another issue is that texels in a cube map become more skewed toward the corners of the cube as opposed to the center of each cube face. Again, this can be accounted for but not usually in hardware.

Other Thoughts

Texture Caching

Textures take up a lot of memory (since it's image data) so a good caching algorithm on the GPU is needed. Usually a Least Recently Used (LRU) algorithm is used, much like a CPU's cache.

For optimization, it's recommended to group polygons by the textures they use.

Clipmapping is a technique to further save memory by partially loading mipmap data. An HD surface/Earth texture in a flight sim would only need a small bit of the level 0 texture, a bit more of level 1, and so on. The giant mipmap is clipped so the entire thing doesn't need to be uploaded. The book doesn't provide an algorithm, so how exactly this works is for further reading.

Texture Compression

A handy solution to caching, uploading, storing, AND fetching would be to store the images in a nice compressed state that can be decoded easily on the GPU.

Direct X Texture Compression or Block Compression (DXTC or BC)- describe a series of lossy compressions algorithms that are pretty neat. The image is broken into 4x4 pixel blocks. Each block is assigned 2 RGB colors, and pixels can only pick from, or a few interpolated steps between those colors. There are slight variations that allow for alpha and store more fidelity of colors, at the cost of worse compression. An ATI specific variant was made called ATI1 ATI2/3Dc

Ericsson Texture Compression (ETC) is OpenGL's variant. The image is broken into 4x4 blocks. The blocks are then divided into 2x4 segments chunks. Each chunk stores a single color and can select 4 pre defined colors constants in a lookup table to use. Each pixel can choose to use one of the 4 colors to modify the chunk's base color. Produces nice results and compresses well!

Procedural Textures

Instead of fetching from an image file, you can write a shader to generate a color (based on UV values if you want). Graphics cards have optimized for image texture fetching and reading, so avoiding that can be a costly decision for real time applications.

Nice use cases are for volumetric textures (like a carved marble statue) and dynamic effects like water ripples.

Animated Textures

Textures can be animated by either rendering video data or changing UV values per frame. Or even blend textures over time thus having a texture change.

Material Mapping

A fancy term to mean using texture/image data to modify how a material is processed, thereby allowing a single material to produce slightly different results over the same mesh.

A common case is a diffuse map meaning the main color of a mesh that's using a typical shading equation.

Alpha Mapping

Giving a texture alpha values so that parts of it render transparently.

This can be used to put a decal on top of another mesh. Or creating very inexpensive detailed illusions like a bush with hundreds of branches and leaves that's really just a picture - render a single polygon plane with the texture on it. This illusion fails when the camera is rotated, but you can add an intersecting plane with the same texture to help keep the illusion a teeny bit. This is called a cross tree. Bilboarding can further fix this, but that's discussed later.

Alpha Blending is a tool to allow rendering of objects with partial transparency, but it's a teeny bit costly since fully transparent pixels still go through the merging pipeline and are not discarded though they will not have any visible effect.

Alpha Testing is when a pixel is discarded during the merging stage because its alpha value is below some threshold value. Apparently the pixel's alpha value can then only be rendered as either fully opaque or fully transparent, which can lead to artifacts. Further apparently, this can be mitigated by computing the alpha map (thus allowing a range of alpha values) as a distance field which is discussed later.

Alpha to coverage or transparency adaptive anti-aliasing is a technique where a translucent (partially transparent) pixel is converted into a fully opaque pixel, but fewer samples are used (huh??? How does this help anyhing). Because the pixel is now opaque it will correctly obscure objects behind it.

Bump Mapping

This is a catchall phrase for techniques to create illusions that a mesh has a more detailed surface than just an image can provide, but less detailed than adding raw geometry.

Detail can be broken down into 3 scales - macro, meso, and micro. Macro is the 3D modeling part, the actual geometry. Here the goal is to create recognizable silhouettes and shapes. Micro describes shaders and specific lighting details, like how skin can do subsurface scattering or a polished piece of metal creates fine reflections.

Meso detail is anything in between and describes effects like facial wrinkles or the bumpy surface of brick and stone. The detail modifies data in the pixel shader.

Blinn introduced the idea of normal maps - creating an image that will modify a face's normal at a pixel level, allowing a per-pixel normal value which will affect lighting equations that (almost all) use the face's normal to determine factors like brightness, color, and shine/specular.

Tangent Space Basis

So we are modifying a surface's normal by some amount based on a normal map. While we could define the modification based on the model space, it simplifies things by instead modifying the normal by the surface's own surface-space. This is called the tangent space basis (a very fancy math term to mean a 3D space like model space). This is calculated per vertex and is stored as…

Tangent and bitangent vectors are values assigned to a vertex, much like a vertex normal, and determine the orientation of a normal map on a mesh's surface. Tangent means orthogonal to the normal, and bitangent is just a complementary vector that's also orthogonal to the normal but not necessarily orthogonal to the first tangent vector. Thus, to save space, sometimes the normal vector is omitted and then calculated per vertex in a shader. But this fails when the normal map is used on a symmetric mesh so that sometimes the normal is pointed opposite though the same normal map is used, a solution is to store the handedness of the vertex which is then used to orient the calculated normal correctly.

Early Algorithms

Offset Map or Offset Vector Bump Map was created by Blinn and stores 2 signed values at each texel location. One to modify a normal by the texture's U direction, and another to modify a normal by the V direction (using tangent space basis). The result is a non-normalized vector pointing somewhere slightly else than the surface's normal usually would.

A Heightfield is a greyscale texture where white represents high and black represents low. Deviations to the normal are calculated by examining how the U direction of texels change and the same in the V direction. This is a common procedure and is referred to as the image's derivative since it's a result of how the image changes. There are a variety of algorithms and filters you can use to calculate the normal offset from a heightmap.

These methods have generally been replaced by normal maps.

Normal Map

Older systems call it dot product bump mapping.

Normal maps are a pre-computed form of bump map, like described in the section above. They give a surface normal value per pixel instead of calculating it based on some offset values.

The change occurred when data storage became bigger, so saving 3 vector values wasn't a concern anymore. Plus the savings in per pixel computation is good.

The texture stores a value between [-1, 1] as a color, so [0, 255] with 128 mapping to 0. This is why a light blue color indicates no deviation of the normal - color (128, 128, 255) maps to a vector of (0, 0, 1) meaning straight up!

So. As discussed in the previous section, normals are stored in tangent space (relative to the surface). We can either convert an incoming light per vertex to this tangent space, or we can convert the tangent space normal to world space. It's standard to convert to world space because it quickly becomes unwieldy once a scene has many lights and surface-based reflections need to be in world space anyway.

Normal maps will always fail when looked at from a close angle, because the ridges and grooves don't actually distort mesh. For example, the mortar between bricks in a wall will always be visible even though at shallow angles the bricks should stick out and generally hide it.

Parallax Mapping

A technique to give surfaces the illusion of occulling (hiding) parts of itself even though the geometry of the mesh isn't detailed enough to hide it.

The height of the texture is stored in a heightfield. The values will offset the texture coordinates for other texture data fetches, resulting in a different part of the texture being used than normal!

The calculation to offset the texture takes the returned height value and the view direction into account which needs to be in tangent space. The heightfield value can also be scaled and given a bias (a value added to every height value).

Again, there's a problem with shallow angles where a small change in the view direction can result in unwanted, big changes in texture coordinates. Also a new problem, stereoscopic rendering will often not work well since the calculation may return inconsistent depth/offset values for the same point (because of the 2 different view direction angles).

Given an original texture coordinate p, adjusted (scaled and biased) hieghtvalue h, and view vector (transformed into tangent space) v, the equation is: p = p + h × Vxy

That equation is technically called parallax mapping with offset limiting, because an earlier equation divided by the view's height (z) which caused erratic sampling at shallow angles and the new equation as shown above limits the amount of offset that can occur. As you can imagine, dividing by a number less than 1 will cause extreme scaling.

Parallax mapping is cheap and works amazingly well and has become standard in real time rendering. However, the mapping equation tends to overcompensate (adjust the UVs too far) and fails for heightmaps where there are rapid, large differences between nearby texels.

Relief Mapping

Given a heightfield texture, the view ray is projected onto the texture surface as a line, and samples of the heightfield at regular intervals are taken along this line. Then, a new line is formed connecting these sampled heights where the height pushes the line above the texture. The intersection that's closest to the view ray is then used as the new UV sampling point. The calculated UV value is not one of the heightfield samples from earlier, but possibly somewhere in-between.

More view ray intersection points are generated for grazing angles to help avoid flickering issues or incorrect samples.

Also called Parallax Occlusion Mapping (POM) or Steep Parallax Mapping.

The heightfield is sometimes used as a depthfield where the highest point is the mesh's surface and the texture describes how deep it goes instead of how high.

Normals are usually still provided via normal mapping, but to save memory they can be calculated using the heightmap in a way like this.

Relief Mapping has issues where the mesh ends, but the visual effect implies there should be more pixels rendered to the screen. This happens because fragments are only generated for where the mesh is on screen, and relief maps give the illusion of a more complex mesh. Shell Mapping extrudes the mesh so that more fragments are generated (the extruded mesh is called a shell).

Another issue is efficiency when sampling a heightmap texture with large unchanging sections that would be nice to skip over sampling wise. Some algorithms include cone step mapping and quadtree relief mapping.

Heightfield Texturing

Aka Displacement Mapping using a Displacement Texture. The displacement texture is the same as a heightmap.

Here, the work is done in the vertex shader instead! Vertices are modified based on the texture. I assume you would provide essentially a flat plane that would normally be one triangle, but is broken up into many triangles that all span a heightmap.

It used to be the case that accessing texture data from the vertex shader was inefficient and ill-advised, but the move to “unified shaders” and general hardware improvements have made this far less an issue and it's now a suitable method for adding detail to a mesh via textures.

Caveats to Bumpmapping

The biggest being collision detection, since collision engines will not know about the deformed mesh, it can cause physical objects to interact with it strangely. Like a ball rolling smoothly along a cobblestone road, seeming to ignore any large stones and gaps.

Similarly, animations can be tricky to build.

Blinn, Phong, Lambertian… these are all old school lighting concepts. Some are from the 1970s! We can do more, do better!

The chapter starts off with two HEAVY chapters on physics of lights, physics of colors, and how computer monitors simulate colors. It's a very heavy intense read that, tbh, did not mean much to me. It's important under certain circumstances, but I only care about modern 3D rendering tools and terminology, not how to build a monitor. Physically Based Rendering (PBR) will depend on physically based models of lighting, but we can obscure over the details. Unless we want to build from the ground up a PBR engine and totally comprehend what the equations do and why.

For now, some terms and definitions

• Solid Angle - a 1 unit-squared patch of area on a unit sphere.
• Steradian - another term for Solid Angle.
• Differential Solid Angle - a tiny patch of a solid angle.
• Radiance - Light emanating from something, measured as energy (Watts per Solid Angle)
• Irradiance - Light arriving to something.

BRDF

Stands for bidirectional reflectance distribution function. Is simply just a function that tries to more physically replicate how light works.

It examines the ratio of outgoing light over incoming light, where the exact values of incoming light is based on the light's direction to the surface and the outgoing light’s reflected direction.

The BDRF is usually written as so

Where the inner function is

This inner function returns an rgb color that is the result of some light L over the differential of radiance E times cos the angle of incidence of the light (clamped to 0 to prevent lights from behind lighting the surface).

This is known as the modern lighting equation. For each light in the scene, calculate one light's color (the inner function) and piece wise vector multiply it with the light's radiance color and cosine clamped to 0 again. Then you sum up this calculation for each light hitting it.

BDRF is a function of 4 scalars - 2 each for incoming direction and outgoing direction. Each has an angle above the surface and azimuth or angle about the normal.

In the end, the BDRF is a complicated equation. Attempts to simplify complexity have been made to create variations on BDRF, like a Lambertian BDRF model. Different models have different lighting properties and advantages.

The book continues to thoroughly discuss concepts of reflectance and micro geometry and microfacets and attempts to create equations that model reality. It is very math heavy and worth reading if you want to implement your own BDRF shaders from scratch.

I skipped over the section on non-point lights i.e. area lights because it's basically more of the same thing, just rounding out the complexity by taking away the point light simplification.

In BRDF, there are 2 phases - calculating the incoming light and calculating how that interacts with the material to produce the outgoing light.

These 2 phases need to be done for each light, for each pixel! And if light sources are different types (directional versus area) and if materials are different (metal simulation versus translucent stained glass) then a shader needs to be made for each and every combination!!! Half-Life 2 had 1920 unique combinations!! Wow!

Looping over lights in a pixel shader, where light types can change dynamically, is too inefficient. Manually writing thousands of shaders is impossible too. The solution is to write one large shader file called an übershader that can be compiled selectively (with preprocessor tools) to produce the exact shaders needed!

Another solution is to use multipass lighting where every object in the scene is rendered once using each light. New shaders only need to be written for each light type, but still one per material.

Deferred Shading is based using render targets and perform all visibility testing before any lighting. The idea is to, in a single pass, output all the shading pixel data an objects needs to be rendered. The depth data (z buffer), normals, texture coordinates, and material parameters are all output as separate render targets, called G-buffers (Geometry buffers). Next, passes are made that render the lights, using the G-buffers’ data! A full screen width quad is used to ensure the entire window is drawn to. Multiple lighting shaders can be used, using the same G-buffer data and their results can be mixed together.

There are many benefits to deferred shading over other attempts to make physically based rendering! Vertex programs are only run once, no need to shuffle geometry data and interpolate it repeatedly, no need to pre-determine which lights affect the scene. Best of all, drops the number of unique shaders down. You only need a shader per material (to generate the G-buffers) and one per light type. This also means it's much easier to play with new programs!

The downsides are that multiple render targets eats bandwidth and fill rate quickly for large, complex tasks. Even still, deferred rendering works well!

Environmental Lighting

Ah FINALLY, the chapter where we examine the problem that has been interesting me for YEARS.

We know how to render an object in a simple way - incoming light(s) bounce off the surface and result in a color on the screen.

But how do we handle that definitive PBR (Physically Based Rendering) look where, clearly, light can bounce off objects and then illuminate others, the quality of ambient lighting is so good! How do you do this?

The book describes the problem as a recursive one without end. The color resulting in our render is the sum of all incoming rays of lights hitting a point on a surface, and some of those rays are coming from other surfaces, whose exiting light ray needs to calculated the same way… from rays of light calculated the same way and so on.

This problem will be solved throughout the chapter in a variety of ways, each taking a look at specific techniques to simulate effects that a truly infinite light bounce would do like ambient occlusion and environment reflections.

Heckbert Notation

A researcher credited their name to a notation scheme that works like regular expressions, but with rendering models.

A model can be represented as a series of steps where Light hits some objects then finally enters the Eye. The notation is read left to right.

• L - a light
• E - the eye/camera
• D - a diffuse surface (just reflects color)
• S - a specular surface (shiny)

The classic rendering model is represented as L(D|S)E. Light hits a diffuse or specular surface, then enters the eye. This shows that for anything to be seen, a surface must be hit by a light, lights themselves do not show up. If lights were handled by the clinic model, then the notation becomes L(D|S)?E where the question mark indicates there is either 0 or 1 surfaces hit by a light before being rendered. The 0 surfaces means light directly enters the eye. If you think about it, in classic models, there is no rendering of a raw light source itself. Only objects with geometry are rendered.

This notation simply provides a way to quickly summarize lighting models and how they work. Returning to our problem, it can be written as L(D|S)*E.

First some definitions

• Occluder - casts a shadow
• Umbra - the region of a shadow that's most dark, the interior of a shadow.
• Penumbra - the soft edges of a shadow
• Hard shadow - has no penumbra

And some physical concepts

• Point lights only generate hard shadows.
• Area and volume lights may generate soft shadows.
• Soft shadows grow harder (lose penumbra) as the occluder approaches the receiver.
• The opposite happens in reverse, the shadow gets softer the further from the receiver and the bigger the light source.

This refers to objects casting shadows onto a flat plane. The process is straightforward - use a matrix to transform the occluder onto the 2D plane surface. To make sure the shadow is always drawn on top of the receiver, either give the shadow a bias (translate it a bit above the surface) or disable the z buffer while rendering the shadow.

If you have translucent shadows, you can end up with the translucent shadows being drawn on top of each other which is unrealistic. One fix is to use a stencil buffer. Render the occluder to the stencil buffer, in preventing the stencil value by 1, then draw the shadow only when the stencil buffer has a value of 1, forcing each shadow pixel to only be drawn once.

Imagine rendering a scene from the point of view of a light source. Whatever the light can see is lit, everything else is in shadow. Now, instead of rendering a scene, occluders are rendered as black colors onto a white texture. This texture is called a shadow map or shadow texture.

Since it's a texture, it can be used to wrap curved objects quite well! The downside is objects have to be marked as a receiver or a caster in the application/engine, casters cannot cast shadows on themselves, and the map has to be updated every time something moves in the light.

Aka volumetric shadows. When a point light hits an occluder, simplified down to a single triangle, the volume hits makes a 3D, 3-sided pyramid. Everything below the occluder triangle is still a pyramid, but now makes up the occluder's shadow!

When a ray from the camera is cast through the scene, each time it passes through a front face of a shadow volume it's considered in shadow, and when it leaves through a backface it's out of shadow! Pretty nifty.

The downside is that pixels have to be drawn many, many times through many passes, generally only allows for hard shadows, and doesn't handle translucency well.

The scene is rendered from the point if view of a light. Only depths are recorded (only the z buffer), all else is turned off. Shadows can then be rendered by determining if the pixel is further away from the value in the shadow map. The shadow map basically contains z values that represent the closest object it encountered, and thus the thing occluding the light. If a pixel is behind an occluder (has a greater depth) then it must be in shadow.

The issues with this approach are that the depth buffer only has a certain amount of precision, causing shadow artifacts to appear when the light is far away compared to the viewer. The fix being an algorithm called LiSPSM which is a complex algorithm to transform the light's projection matrix to kinda match the view's so they both share a similar sampling pixel density.

I'm noticing an unspoken idea in the book - shadow maps are generally made for just one light, a single directional light like a moon or sun. I guess whatever algorithm is used, if multiple lights cast shadows then the algorithm has to be run for each?..

Cascade Shadow Mapping (CSM) is a technique to use multiple shadow maps at various densities (further from the camera, the lower the texel density) and have objects use different maps based on how far they are from the camera!

Percentage-Closer Filtering

Creates soft shadows by sampling shadow map results around a point instead of just a single location! The samples (in shadow or not) are interpolated together, with the more samples meaning more granularity and softness.

The downside here is a large amount of time that it takes to perform all the samples calculations and a few other artifacts and sub calculations have to be accounted for at each possible shadowed pixel.

Uses math to calculate a difference between occluder and receiver that allows the depth map (shadow map) to be squared, filtered, and otherwise used like a texture so it's much more optimized for rendering and produces great results! This is a good tool for environment/terrain shadowing.

Main downside is artifacts where light will “leak” through a shadow because an occluder is shadowing another shadow.

Ambient Occlusion

Shadows need to fill cracks and little spaces of mesh to provide realism. Ambient light is general, directionless light that just exists. Ambient Occlusion is shadowing that is caused by this light.

The theory for shading this is examining how visible a point on a mesh is to the outside world. A section in a crevice or corner will be very occluded so should be darker than a point on the outside of the mesh. The more ways a point can receive light from the world, without occlusion, the brighter and less affected by ambient occlusion it is.

This works well for models, but won't work for scenes with multiple separate objects. In that case, we can do ray casts and see how far away other intersections are. If they're far away, no or low occlusion! Taking the casts to infinity will always result in occlusion for enclosed spaces, so a maximum ray length is needed.

True ambient occlusion is an unsolvable infinitely recursive problem because light bounces so many places and outgoing radiance can't be calculated until other radiance are calculated which can't be calculated until…

An actual implementation technique uses the z buffer after initially rendering the scene. For each pixel, nearby depths are sampled randomly around it in a sphere. The ratio of hidden to visible samples determines the occlusion. Although good results need 200+ samples, which is too much to process, a solution is to use at most 16 samples and then blur the result. Another solution is to use an unsharp mask algorithm on the z buffer.

Reflections

Reflections are images of a scene, mirrored about the reflective geometry. A straightforward and practical solution is to mirror the entire scene's geometry, render it, and use that as a texture to render the mirror. Similarly, you can render the scene from the camera's mirrored point of view and use that as the mirror texture. Both achieve the same effect. It's recommended to use the stencil buffer so that only geometry with the mirror material render the mirrored geometry.

An issue of geometry that cuts through a reflective plane (like a rock sticking out of water) will cause the rock geometry below the water to mirror entirely so that the part under water now appears above it, which is incorrect. A fix is to use a custom clipping plane (somehow) right at the reflective plane that will ensure objects below it do not get rendered.

Once you have the reflected environment map as a texture/image, you can do so many techniques on top of it like using bumpmap techniques for rippled water or textured/frosted glass!

For non-plane geometry, like a sphere or creature, the most accurate technique is to use ray tracing (discussed later) or render environment maps (EM) each frame, using the previous frame’s EM to make the current one. This produces a nearly recursive EM, which is good and accurate.

Transmittance

Fancy word for a more physical/real take on translucency. The concept is to have the material act more like a filter - incoming rays only output certain amounts of colors along the light spectrum. Thicker parts of the mesh should filter more light than thinner parts as well.

Beer-Lambert Law provides a math formula to derive how much Filtering based on the model's thickness, and someone named Bavoil introduced a simplification to that equation for real time rendering. All that really needs to happen in the shader is calculate how thick the current view ray (per pixel) is through the model. This can be done by first rendering the back faces to the depth buffer, then reading those values when rendering again like normal.

Refraction

The effect of light being curved when it changes medium, like how a straw in a cup of water will look bent.

Dispersion is a property where different wavelengths are bent by different amounts. This is how prisms form the iconic rainbow effect. To replicate this would mean managing many rays of light being made from a single light ray, which is too much to ask for.

Back to refraction...

One solution is to render a cubemap environment map at the location of the refraction model, then refer to that map when the model itself is rendered.

This doesn't account for backfaces, which is usually fine anyway. A technique using ray tracing considers backfaces and total internal reflection if desired.

Caustics

A fancy term for when light is focused into specific spots or designs, like through a magnifying glass or circular designs from a cup of water.

An image based solution works like a shadow map. The scene is rendered from the point of view of the light and whenever it passes through a reflective or refractive object and the light bends, that location is marked. The result is called the photon buffer. Spheres representing the accumulated light, called splats, are generated from the photon buffer. The splats are rendered in a second pass, transformed into the camera's point of view, into a caustic map, which is rendered in the final 3rd pass

Other physically based algorithms exist like caustic volumes, but sometimes a simple hack provides visually pleasing results. Like a caustic texture being projected onto the scene. A special algorithm for water/sea fanatics is to ray trace from the sea floor up to the surface where the ray is then bent. The more bent the ray, the less light drawn at that point on the floor.

Global Subsurface Scattering

Subsurface scattering refers to when light changes direction after entering a physical object before it is emitted from that object. When the light doesn't change direction too much, like a creature's skin, it's handled by lighting equations like BRDF. Global means the light has traveled a longer distance, more than a single pixel.

Physically, light is scattered differently depending on its wavelength. For example, air scatters blue more than red light, giving the appearance of blue sky.

When light is scattered, it can either bounce once inside the material, or many times. Rendering algorithms are grouped up by single bounce or multi-bounce methods. Multiple scattering leads to more pleasing results, though. As light is scattered, more of its energy is lost (absorbed) into the material.

Wrap Lighting

A technique where light is wrapped around a curved object, softening the transition from light to dark. Changing the hue of the light, like more red for white/light skinned humans, works well here.

Normal Blurring

This comes from the observation that specular reflectance does not change with subsurface scattering and only cares about the normal map. Diffuse lighting (diffuse reflectance) can be used to simulate subsurface scattering by having it ignore the normal map, or use a blurred version of it.

Texture Space Diffusion

A model's diffuse color is rendered to a texture, then is blurred multiple times using different color filters. The results are then combined during a final shading. This technique is expensive because of all the rendering.

Full Global Illumination

This section covers algorithms that combine all of the thoughts on Global Illumination so far. Algorithms that simulate a full, physically based, rendered scene. Typically it's a bit too much work to do these in real time, but the algorithms are instead often used against the static content in a scene, compiled and saved before the game runs, and then referred to at run time during real time.

Named after the first program used to demo the technology. It's a process of computing ambient/global light by allowing all diffuse surfaces to send out light into the scene. This way, light truly bounces off of surfaces as in real life to light up the entire scene.

The algorithm is to first generate patches on top of generalized spots on the surfaces in the scene. Could be verticies or polygons or whatever. Each patch is a hemisphere which details how much light travels from this patch to one other patch. A form factor value is calculated for each patch to patch, and denotes how much relative light travels between them. Distance between patches, if anything's blocking them, and which way they face are all considered in the calculation.

Handling infinite patch emissions is impossible, instead you cut it off at some point. One solution is to first have the lights shoot out to the patches and whichever patch is brightest then will be the next to shoot out light. Now that patch shoots light and whichever other patch is brightest takes its turn. This repeats some number of times and significantly reduces the number of patches to process while focusing on the most important (brightest) patches.

This algorithm is still reserved for offline rendering mostly, though it has seen real time use.

Ray Tracing

Ray tracing is exactly what it sounds like, and so far the book has only provided a brief introduction to the concept, not so much how to actually implement it on hardware.

Rays are shot out from the camera/eye, through each pixel to be rendered to the screen, and the rays of light calculate color data as they go. Color data is calculated using a variety of techniques discussed early and generating new rays pointing to each light the object is lit by. Bouncing from one material to the next. Reflecting and refracting causes new rays to be generated as the process goes on until we decide we’re done bouncing. As the ray travels along, it checks to see if anything is hiding it from light sources (causing it to be in shadow) or if the thing obscuring it is translucent, causing the shadow to be colored light instead.

The upside to this technique is that it produces high quality physically-based visuals! It works on a variety of primitive types, not just triangles (e.g. curves).

There are several big challenges to this technique. The processing of all these rays doesn’t work in traditional GPU hardware accelerators - OpenGL and DirectX. They operate on the notion of processing meshes/triangles, producing fragments, then rendering some output data based on available information. Nothing about rays, handling object collision, reflection and refraction… Regardless, the parallel nature of GPUs do make them prime ground for processing this kind of data! Rays can be batched up into similar directions and processed at the same time as such (Disney did a presentation on this, describing that’s how their renderer for Zootopia partially worked).

Other techniques

Irradience Caching or Photon Mapping. The scene is rendered from the view of a light, saving how much light hits a surface. Next, surfaces examine nearby surfaces and store their indirect light received. Finally, a ray tracer does its work, calculating light using the light sources and these photon maps.

Precomputed Lighting

These techniques are slow because they take so many calculations and, as of the writing of this 3rd edition, GPUs haven't quite become adapted to processing them.

A solution is to compute this heavy, fancy, realistic lighting using the static geometry and static lights of a scene first and then referring to those results when rendering.

Simple Surface Prelighting

An old school technique used back in Quake II days. Static lights are rendered into the scene and the geometry stores the light data indicating the light hitting it. The data stored can either be a light map texture that will then be applied to the geometry or even have the data stored per vertex.

Downsides is the lights can't change, doesn't work with any dynamic objects, heavy memory storage issues, and the light has no directionality component after its saved.

Directional Prelighting

The concept is simple - add directional data to the stored prelight data (kinda introduced above). The solutions are quite complicated though. There are a variety of techniques that focus on how to store the data so it can be used during rendering.

One technique focuses on storing the data like a normal map, but they are all very math heavy and I cannot simplify them. Look into techniques like:

• Used by Half Life 2
• dot3 lightmaps
• Used by the Crysis engine

Volume Prelighting

This is the technique of generating ambient light (indirect light) environment maps at specific points in the scene and then letting the dynamic objects reference them to give dynamic objects ambient light as well. The challenge becomes interpolating between the environment maps as the objects move.

Precomputed Occlusion

Just as lighting in a scene can be precomputed, so can information about how obscured parts of the scene are.

Precomputed Ambient Occlusion

Focusing mostly on static geometry and static lights, a scene is processed for ambient occlusion, as described above, and results are stored. Possibly per vertex, or as textures.

Moving objects can take advantage of this too by computing AO (Ambient Occlusion) in a cube and saving the results as a cubemap. The map describes how the object affects nearby objects.

Animated objects take more work because they have different poses. One solution is to generate AO for a variety of important shapes and interpolate between maps or sample whichever is closest to the current pose.

Precomputed Directional Occlusion

Additional work that goes into preserving a directional aspect to the ambient occlusion. Does good work with heightmaps that can produce nice soft shadows.

This is a topic on how to precalculate work for everything else global illumination - reflection, refraction, subsurface scattering, etc. The name comes from how iradiance transfers into radiance (incoming light to outgoing light)

Skipped over because it's very math intense and I'm getting tired of physically base algorithms.

Image Based Effects

This section essentially covers everything that cannot be modeled well with polygons - e.g. fur and clouds. Also topics on post processing and rendering a final image to certain, non-photorealistic styles.

Fixed View Effects

By simplifying to having a fixed view (the camera doesn't move), a lot of rendering can be saved and reused. Once rendered to a buffer, and resupplied per frame, any graphics buffer (G-Buffer) can be reused unless the camera moves. This can allow for techniques like rendering a complex scene/mesh once and then rendering tools and UI on top to interact with it. This is what Computer Aided Design (CAD) tools do, allowing for measuring, annotations, and other things. Another use would be texture painting - painting right on to a 3D model.

Golden Thread or Adaptive Refinement or Progressive Refinement are techniques where a scene is initially rendered ar a quick, lower quality then re-rendered at higher qualities unless the camera moves. The more detailed renders may use slower techniques and just focus on parts of the full display image at a time. These refinements are swapped in any way wanted, either instantly or faded in slowly.

Skyboxes

A cubemap texture is used to represent distant/far objects like the sun, mountains, clouds, etc. It's fast, efficient, and adds a lot of detail to a scene. The texture should be fairly high resolution to avoid artifacts.

In practice, the skybox is rendered using a cube mesh surrounding the scene, or a dome. Team Fortress 2 used a dynamic skybox system where a separate skybox environment map was rendered out of view of the player, then rendered as a skybox.

Light Field Rendering

The concept of taking MANY real life photos of a single object or scene. The images can then be processed to create a 3D model just from the images or a holographic like effect where the object/scene can be rotated around as though it were a 3D rendered scene! As of this book’s edition, there are not many realtime applications of this effect.

Light Field Rendering itself is using interpolation techniques to render new, computed, views of an object that a camera hasn't captured.

Sprites and Layers

A sprite is a generalized term for any 2D image displayed onto a screen. A sprite could be a plane that's always view facing or rendered using some other mesh. Usually the sprite has some transparent parts to it so it's not just a rectangle.

A layer is a set depth on which many sprites exist on. All sprites on a single layer are rendered at the same time, and layers are rendered back to front. This allows for nice parallax (distant objects move less than nearer objects as the camera moves horizontally) effects by moving the distant layer less than the closer one. Also easy zoom effects, by scaling the closer layers more than further ones.

Billboards

A billboard is a sprite that always faces the viewer. The 2D sprite rotates as the camera rotates.

Screen-aligned billboards strictly always face the screen. These work well for things like text or purely 2D games.

Viewport aligned billboards rotate a little, but generally face towards the camera. Technically they face the viewport, the view frustum. This is to purposefully create foreshortening and perspective effects with the sprites. May be a good option for imposters (images standing in for actual 3D mesh).

Some advanced techniques of billboards involve using animated explosion sprites and repeating them in many ways, randomly, to simulate a real explosion cloud. Relief mapping can help make it look even more 3D. Clouds also can be made the same way, repeating cloud billboards and varying their transparency and transforms.

Soft particles are a billboard technique where translucent billboards dynamically change their transparency per pixel - more transparent the closer it is to other objects. This solves problems where billboards intersect nearby geometry and kill the illusion that it's a volume effect like dust or smoke.

Another artifact of billboards is when they suddenly pop out of existence as the camera moves, or characters pop in front or behind of them. A solution to this is to treat the area where the billboard is taking effect as a volume and render the billboard more or less transparent as needed and always on top of objects until they've fully left the volume.

Axial Billboards have a set 3D axis they're only allowed to rotate along, but from there they always try to face the viewer. These can be used to create beam particle effects where the beam trail is the axis on which the beam billboard rotates.

Particle Effects

Basically any system where really small, tiny primitives are generated en mass and given some behavior. Possible to render as an axial billboard for thicker lines.

Imposters

Imposters are Sprite substitutes for 3D geometry. Either the object is substituted with a premade sprite, or rendered in real time to a texture. Rendering on the fly works better overall since you can generate a texture size to best fit the screen display and have the object oriented precisely.

Billboard clouds are collections of billboards used to represent one object or effect. Like a tree where branches and leaves or pine needles can be rendered each as a single billboard and their collection is referred to as a cloud.

Billboards and sprites can include depth information too, called depthsprite or nailboard. These can offset depth values at the pixel level when rendering the Sprite, which can really help fix some issues when the sprite is an imposter of a 3D object, or simply want faster rendering times with particles.

Post Processing

Post processing covers most any technique where an input image is worked on, modified, and some altered output is resulted. It differentiates itself from everything else so far in that geometry is no longer the main source input.

Because of how GPUs still work, it's necessary to render a plane that covers the entire screen in order to trigger the pixel shader to run across the whole screen/image. It's silly, but also leads to a potential trial optimization where the screen can be segmented into tiles so only tiles that need processing are processed.

Kernels

Aka filters, Kernels are  a way to calculate a per-pixel value by summing and weighting neighboring pixels by some amount. Gaussian and Sobel are names of popular kernels with specific weight values. For example, when a Gaussian kernel is applied to an image, it blurs the image.

Kernels have a set size, like 3x3 or 5x5 grid sizes. The weight values can be positive or negative. The center of the kernel represents the current pixel, with nearby pixels represented as one grid row/column over. Each pixel is multiplied by the weight value then summed up. This value is saved for the current pixel, and the next pixel is processed.

Bilateral filters use the neighboring pixels and the value of the pixels in the weighting process. This technique apparently has the ability to preserve edges (sharp changes in pixel color, usually a white on black) which most filters tend to blur the most.

Edge Detection

A handy technique for computer vision algorithms, but also leads to making cool effects where objects are outlined like they were inked in a comic. First, an image is blurred only along the horizontal direction, then a differential filter is run (comparing differences between neighboring pixels). Big changes mean an edge! It's that simple.

Tone Mapping

An image is redrawn or rendered with a different luminance (brightness) scaling across the image. Good lighting is so important for a render. Too bright overall and it looks washed out, too dark nothing is seen. The goal is to use contrasts.

Inspired by how human vision adjusts based on the level of light around them, the goal is to remap the lightness of pixels in an image to fit a certain goal, so dark image will be brightened and a bright image will be dimmed while preserving contrasts ideally.

High Dynamic Range

High Dynamic Range (HDR) is

Other Concepts

• Color correction
• Any technique where an image's per pixel color is transformed or altered in some way, like tinted blue

Special Effects

Stencil Buffer

How to use the stencil buffer! Makes like a global rendering mask so you can additionally test fragments against it to see if they should be drawn.

Unfortunately, writing to the stencil buffer also means writing to the color buffer. By default you can't write to just the stencil buffer. You could output a purely transparent color, but that's inefficient. Better to output a flat color and make sure it's overwritten later, or output a neutral/dark color so it's not noticeable.

Writing to the stencil buffer is similar to writing to the color buffer in that geometry is needed to generate fragments indicating which “pixels” in the stencil buffer will be written to. The difference is it's not programmable like the fragment shader, instead you have a few options to play with.

Basic flow per render/effect:

1. Enable stencil writing
2. Render whatever is creating the stencil mask
3. Disbale stencil writing, enable reading
4. Render object(s) using the values now in the stencil.

At some point you will want to clear the stencil buffer between complete frames.

Use glEnable and glStencilFunc to setup the stencil test before rendering. This tells which comparison to use and gives a hard value (called the reference value) to compare stencil buffer values to. Also pass in a bit mask that is AND with the reference value and the current stencil buffer value before comparing. For example, using equals comparison, a reference value of 1, and a mask of 0xFF results in a stencil mask where only stencil values of 1 are considered passing.

Use glStencilOp to allow updating of the buffer. Tell it what to do when the stencil test fails, if the stencil passes but depth test fails, or both tests pass. There are several options like keep (no change), replace (with the reference value that's also used for the test), increment or decrement by 1, increment or decrement by 1 with wrapping.

Optionally, you can provide a bit mask via glStencilMask that determines which bits are allowed to be written to. Only 1 bits are allowed to be written, 0 bits are write protected. By extension, a mask of 0x00 means writing is effectively turned off regardless op settings.

That's it! It's up to you to figure out how to use it and what to do with the mask.