Notes From Rendering Textbook
This is a general description of how things get rendered. The following are stages a program goes through.
During the application stage, Rendering Primitives are created and sent to the Geometry stage.
Rendering Primitives are points, lines, and triangles.
Although we think of the camera moving around the world, the math works out better if we move THE ENTIRE WORLD and leave the camera at the center.
Covered in detail later.
This stage may change the number of primitives produced, creating new triangles or deleting existing ones.
There are a couple common view volumes, and when they are stretched into a unit cube they perform specific kinds of projection.
A projection is a mathematical term meaning a 3D space (our world) becomes 2D (for our computer screen).
Orthographic projections happen when the view volume is a rectangle or cube. Things in the distance appear just as big as things in the foreground, which means no sense of depth is rendered. This is good for 2D games!
Perspective projections happen when the view volume is a frustum (a pyramid with the pointed top chopped off). Depth is rendered, things in the distance appear smaller than things in the foreground, just like real life!
Everything outside of the unit cube is thrown out.
If something is partly in the unit cube, it is cut along the plane(s) it's intersecting to make new triangles so that -everything- is in the cube.
Sectioning is the process of cutting up a primitive to make it fit in the cube.
The unit cube is scaled so that it fits the screen. The x and y coordinates of all the vertices in the unit cube are mapped to be between 0 and length/height respectively. The z coordinate remains between -1 to 1 and is passed along.
Finally the image is rendered given all the 3D data and transformations from above. We're working with the resulting unit cube and the 3D objects in it.
Differentials for the triangle are calculated (no further information is given, I don't know what this means), and other data is computed for the triangle like for interpolation.
Also called Scan Conversion.
For each pixel with its center (aka its sample) located in a triangle, a fragment is generated. Data from the triangle from the geometry stage and interpolated data is saved with that fragment.
Also called Pixel Shading.
This is a custom program to modify colors of the fragment and change what the ultimate color the 3D object will have.
Per-pixel data is computed here. You can do cool things like texturing and even cause effects based on where in the screen the pixel is!
Each fragment writes out some color data to a color buffer and its z value to a z buffer and stencil value to a stencil buffer. Which color gets saved to the color buffer is determined through this stage. The most common technique is to compare depth values.
If the current fragment color being written is from a triangle far away, and a fragment was already written to this pixel location from a closer triangle, the color is not saved. This results in closer objects being drawn on top of distant objects.
This stage is very configurable. You can change how the depth test works (maybe farther stuff gets drawn on top, or the depth test is ignored), if and how a stencil test works, or maybe the color being written should be mixed with whatever color is already there!
Also called Raster Operations. These are tools and techniques to modify how colors are written to the color buffer.
Alpha (transparency) data can be saved and tested on too. A common technique is to remove fragments that are fully transparent so they don't overwrite the z buffer.
A stencil buffer can be written to as well, then read and tested against to see if colors should be written to the color buffer. You could draw a circle to the stencil buffer and then set a test so that only pixels inside that circle are rendered.
This can either refer to the color and z buffers as a single package, or refer to a buffer of images to do effects like motion blur and anti aliasing.
This section covers the configurable and programmable parts of graphics rendering. It covers some of the same topics from the last section, but is a little more technical.
In software and hardware design, a pipeline is a specific kind of system where data flows from one module to the next, where each module performs some tasks and modifies the data. Some also say the data is filtered as it goes along and these filters can be swapped out without changing the entire system.
+ means the stage is configurable.
* means the stage cannot be modified at all
The rest of the stages are fully programmable.
Each single draw call will cause this entire pipeline to run once for all the primitives passed into the draw call.
High end GPUs are generally made by 2 companies - NVIDIA and AMD.
The graphics card is honestly like a little computer. Many companies use the chips designed and built by these makers to build the physical card itself. This is why there are mostly 2 brands/types of cards but dozens of brands to buy from (Radeon, Asus, etc.)
The GPU can only understand machine code, which is defined by the chip designers (Nvidia and AMD), and they both have different machine languages.
The problem is graphics programmers need to write code for both of these chipsets, and writing assembly (or especially machine code) is incredibly impractical.
The solution is to use a software design called abstraction and virtual machines.
Abstraction means we are generalizing how the chipset works and using a higher level language to write code. The tradeoff is we no longer have direct access to the processor and lose the potential for extreme optimization, but we have a much easier time writing code.
A virtual machine means we are representing the chipset as a simplified, generalized processor instead of worrying about the exact details of the specific processor and card.
A graphics driver translates the higher level language into machine code instructions for the GPU.
These are all higher level languages, the virtual machines we can write code to.
Some may offer certain tools the others dont have, or certain optimizations, but at the end of the day it's like Java versus C#. They both offer the same basics and advanced tools, just under different wordage.
There's tons of fascinating history between OpenGL and DirectX that I want to read about.
Vulkan is the newest language and it offers a lot of specific kinds of optimizations and appears to be the future of graphics programming.
You may come across this term when looking at tutorials or software. An OLD piece of software called GLUT advertises itself as an OpenGL binding for c++.
A software binding means a tool that allows a software language (c++, Java, C#) to use a graphics language (OpenGL, DirectX).
These apply to all graphics languages.
There are a set of coding primitives that the GPU understands.
Programmable shaders have flexible inputs that come in a few varieties.
Remember these are inputs to the GPU. Programs you write can use a more standard set of types - float, int, matrix, and others. This is simply how to send additional data to the rendering pipeline during a draw call.
Outputs are very constrained and have fewer options. (The book really cut it short here…)
A big part of graphics programming is trading off render speed for quality, but as a student I want to emphasize the need to focus on function over optimization. Get your demos working and running at all, then look at optimizations. Don't do complex optimizations right away.
Flow control means using if, else, and switch statements to create branching logic in a program.
CPUs are amazing this, GPUs are not so good at it because they operate on multiple data at the same time, so they can't easily manipulate some data and not others.
The exception is static control, which means a uniform value is used as the conditional so the program will know which path to take for all data for a single draw call.
Dynamic control uses varying inputs for the conditional so the program could go either way for each pixel or vertex. The GPU handles this by calculating the results for both paths, then throwing one result out depending on the condition.
Shaders can be compiled ahead of time or when the program runs. The code is stored as a string.
In my experience with GLUT and SFML, the input is a GLSL file to the library/binding, and if it compiles correctly then the program is stored on the GPU and you get an ID to identify it. You can compile multiple shaders and swap which shader program to use with the IDs.
The first programmable step is the vertex shader.
Before the vertex shader, though, input is assembled. This means input vertex position data is paired with any other vertex data, like a vertex having a color or UV position, or both! The utility is being able to swap out color or UV data but use the same mesh.
Instancing is when the same input mesh is drawn multiple times in a single draw call, but some other associated input data may vary per instance.
The vertex shader only operates on vertices and any associated data with the vertex. It does not create or modify the triangles or the mesh. It focuses on changing colors, texture coordinates, normals, and ultimately returning its final clip space position (implying any projection techniques happen here too).
Input vertices are treated independently and have no data on the other vertices in the draw call.
As input, the geometry shader receives one of the 3 primatives - a point, line, or triangle. Other primitives can be defined and then used (apparently). Additionally, the adjacent points, lines, or triangles can be made available to the program.
The program outputs points, polylines, and triangle strips. You can also output nothing, causing the primitive to be deleted. There is a rough upper limit of creating 1,000 primitives per invocation, so don't use this for tessellations, but a few copies is okay.
The shader defines which type of input it wants and which type of output it will result in. They don't have to match, so a mesh defined using triangles can output points or lines.
You can even modify vertex data here. Similar to the vertex shader though, the output positions if each vertex is in clip space.
Stream Output is a new feature where data from the Geometry stage is sent back to the application as an array of data. Normally the data goes straight to the Rasterization stage. Rasterizarion can be turned off if wanted! This turns the GPU into a general data processor that specializes in SIMD operations!
This program operates on one screen pixel at a time. When a triangle mostly covers a pixel, a fragment is generated and this program processes it. The program returns a color which will be stored and merged into the color buffer during the merging process.
Input data comes directly from the vertex or geometry shader. 16 or 32 vectors may be used as inputs respectively. Additional inputs like the screen position of the pixel and whether the triangle’s front or back is facing are available.
Similar to the vertex shader, the program does not know about any other pixels. Th10e exception is for differential functions (also referred to as gradients, these can calculate how a pixel changes along either the x or y direction). This is useful for filtering and edge detection.
The depth value (generated in the rasterization stage) can be modified here. The value in the stencil buffer (also from the rasterization stage) can only be read from.
You can generate no output, but that forces poor optimization from the GPU (hide the mesh other ways if possible).
The fragment shader can output to multiple buffers, not just the one color buffer meant for the display, in one execution!
A very efficient technique where we would otherwise need to run multiple passes.
Again, purput colors are combined with whatever is on the frame buffer at the moment. Either colors are overwritten or they can be multiplied, added, subtracted, bitwise… there are so many options.
If MRT is being used, then the blending applies to all the buffers too. Though there are new options to allow different render targets to have different blend options.
Taking a step back to look at a GPU pipeline with specific vertex, geometry, and fragment shaders and merge settings and maybe even multiple passes. The entire setup is usually grouped together as a single Material. It can also be called an effect. A single, specific visual effect the configured system makes using a defined lighting equation with specific expected/allowed inputs.
Tools exist that encapsulate this idea of an effect where a single file contains all the instructions and programs for the GPU.
Unity’s shader language and shader files operate very much like this. A shader file defines passes, vertex and fragment programs, input and output data, and so on. Then, in the editor, you can create a material that uses a customer Unity shader. In this sense, a material instance is a use of a shader with assigned, specific data provided.
For example, unity's default material uses a default lighting shader. Each unique model will have its own unique material assigned to it that will provide the color and texture data for the model.
Setting up an OpenGL binding and window with a draw loop is a pain in the arm (albeit an educational one off pain). An engine does that for us and simplifies writing code.
Unity is one engine (and it runs games too!), but so are things like SFML.
Depending on the engine, you'll like be writing Effect files which contain some pure shader code for the vertex, geometry, or fragment shader. The rest of the effect file will be engine specific code telling how to use the effect.
This is why I find it best to study general GPU theory than a specific engine because in order to understand and use the engine you'll need to know the system.
You'll often see Effect code like
Struct appdata {
float3 position : POSITION
float3 normal : NORMAL
}
Semantics are those all caps words after the colon. They signify where the data comes from or where the data is going. Like our struct appdata here is storing our input mesh position and normal vertex data.
These are assigned during the input pairing stage. This is how the GPU knows to assign the data in this struct these values.
Semantics is one of the few BIG changes between DirectX and OpenGL. OpenGL does not use semantics at all because OpenGL compiles the vertex, geometry, and fragment shaders into a single program (so all inputs and outputs are already known and where they go). DirectX compiles them separately so it's important to define where the inputs and outputs are coming and going.
The names of the semantics and what data they provide are totally dependent on the engine you're using. DirectX provides a set of them on their own and a good engine will include then build off these. So simply pray to whatever goddess you hold dear that the information is well documented. Or best bet is to find examples and start building a public library of semantics.
This is a generalized, quick section that introduces words you will see in the field, but explained simply and without much detail as to how and why.
Let's have fun and explore making things, then, get into detail about subjects you wanna explore!
This is a math heavy section and the ideas are covered more in detail in my linear algebra notes and quaternion notes.
I'm also tired of getting bogged down in trying to fully understand the math at play when really all you need to know is the definitions (so you can understand ish what is being conveyed by technical write ups) and a very general why the things are used.
There's a common flow whenever a texture is applied to a mesh:
Typically when building a 3D mesh, you will often set specific UV values, however you don't need to always provide them and you could use an algorithm to generate them in one of the programmable shaders.
Examples of algorithmic projector stages are environmental mapping and tri planar mapping.
Here the UV values are transformed, modified, and prepared for fetching data from the texture. Common tools like stretching and rotating the texture can happen here.
Any shading languages use the range [0, 1) for UV values, where (0, 0) represents the origin (usually the bottom left corner) and (0.999999…, 0.99999…) represents the top right corner.
What happens to values beyond this range can be toggled in one of the programmable stages. UV values can be…
Mirroring textures is a standard practice to fill environments, but can lead to situations where the tile becomes obvious and it kills the immersion or feel of the scene.
One solution is to combine multiple textures to create hundreds of unique possibilities. Or to use a set of tiles that all share the same border so they can be used interchangeably.
This stage seems straightforward - convert a UV value to a texel and return it, but there is some nuance.
Let's say we're applying a 256x256 texture onto a square. As long as the square is projected to roughly 256x256 screen pixels big, everything is great! We get problems when the square becomes smaller (minification) or bigger (magnification) than 256 screen pixels big. The texture will get distorted, so how to do get around that?
The easiest solution is just to grab the nearest texel, but this can result in a lot of aliasing. Some more refined approaches are bilinear interpolation and cubic interpolation, where an array of nearby neighboring texels are fetched and weighted to calculate an average color. These methods are more effective at the cost of more GPU work.
Another fix is to apply high res detail textures on top of the magnified texture. Or using vector graphics (SVG files) which can be magnified losslessly.
Nearest Neighbor can be used again, but will result in worse artifacting because so many texels may influence a single pixel.
Temporal Aliasing refers to aliasing getting worse as the object moves further from the camera.
Biliner interpolation is better but still causes aliasing. This problem is harder to solve because it would require sampling tons of texels per pixel. A proper solution is to compute some modified textures beforehand using some more costly algorithms, then swap in these minified textures.
Mipmaps are simply a list of minified textures. Level 0 (the original, unmodified texture) is scaled down using whatever minification algorithm, to half its size. This repeats all the way down to a 1x1 pixel image. These images are created outside of the GPU before rendering ideally.
Deciding which submap (a single mipmap level) to use is calculated at runtime. The process is called Level Of Detail (LOD) or lambda. You calculate 4 values - how the U and V values change as the screen’s X and Y values change, then take the absolute biggest value. These calculations are provided to us as gradients in shader programs.
The LOD value is a number corresponding to a mipmap level. 0 being the lowest and represents no minification. Fractional values may be used and it means the returned color value from the texture is linearly interpolated between 2 different mipmap levels.
The flaw of mipmaps is it will favor the smaller, blurrier image when presented with wildly varying gradient values. Imagine a camera sitting on a long table. The table pixels close up will want a low LOD value, but table pixels far from the camera will want a higher LOD value. Even though mipmapping is sampled per pixel, up close values will still pay extra attention to the high rate of change of texel values as the screen y increases, resulting in a low res mipmap for close pixels.,
A deviation of mipmaps is the Ripmap - textures are also minified down as rectangles, so that it can better handle heavily skewed sampling like the problem above describes. But this still doesn't cover all problems and costs quite a large amount of memory to store all the textures.
An alternate algorithm to mipmaps is Summed Area Tables (SAT). We calculate and store an average pixel color at each texture pixel, then when a screen pixel is mapped on to the texture, a fairly accurate average color for that exact area can be returned quickly! This algorithm still fails when the texture is very skewed and rotated, resulting in a rotated rectangle being mapped to the texture, which cannot handle rotations in this calculation, again resulting in blurring.
Both Ripmaps and SATs are types of Anisotropic Filtering Algorithms - algorithms that retrieve texel values over areas that are not square.
However, in the infinite wisdom that is graphics rendering jargon, there is one highly used algorithm today called…
Anisotropic Filtering - it uses the same algorithm as mipmaps, but takes multiple samples instead of just one at the center.
This process produces better results than just mipmaps and doesn't require any additional texture memory space!
These textures are made up of some number of 2D slices of textures. Some medical imaging tools produce 3D volume textures like this.
They certainly require a lot of memory, so are really only useful for specific situations that would benefit from a volume texture, like volumetric lights, a carved marble or wood statue model.
If a render has issues with seams, volume textures can help since parameterization is very straightforward since we're not projecting down to 2D.
6 2D textures are stitched together to form a cube, where the textures face inward toward the center of the cube. Ideally, the textures all seamlessly flow at the edges so the faces of the cube are hard to notice.
The cube map is sampled very differently. A 3D unit vector is used to fetch a value. The vector starts at the center, inside the cube and points outward to one of the “walls” of the cube. Simply imagine a vector representing the angle someone is looking at from inside the cube. I have a thorough write up on using cube maps here.
Cubemaps can use mipmaps and interpolation, but hardware cannot interpolate between cubemaps faces, which can make the seams more obvious. You can still interpolate manually by writing our own shaders.
Another issue is that texels in a cube map become more skewed toward the corners of the cube as opposed to the center of each cube face. Again, this can be accounted for but not usually in hardware.
Textures take up a lot of memory (since it's image data) so a good caching algorithm on the GPU is needed. Usually a Least Recently Used (LRU) algorithm is used, much like a CPU's cache.
For optimization, it's recommended to group polygons by the textures they use.
Clipmapping is a technique to further save memory by partially loading mipmap data. An HD surface/Earth texture in a flight sim would only need a small bit of the level 0 texture, a bit more of level 1, and so on. The giant mipmap is clipped so the entire thing doesn't need to be uploaded. The book doesn't provide an algorithm, so how exactly this works is for further reading.
A handy solution to caching, uploading, storing, AND fetching would be to store the images in a nice compressed state that can be decoded easily on the GPU.
Direct X Texture Compression or Block Compression (DXTC or BC)- describe a series of lossy compressions algorithms that are pretty neat. The image is broken into 4x4 pixel blocks. Each block is assigned 2 RGB colors, and pixels can only pick from, or a few interpolated steps between those colors. There are slight variations that allow for alpha and store more fidelity of colors, at the cost of worse compression. An ATI specific variant was made called ATI1 ATI2/3Dc
Ericsson Texture Compression (ETC) is OpenGL's variant. The image is broken into 4x4 blocks. The blocks are then divided into 2x4 segments chunks. Each chunk stores a single color and can select 4 pre defined colors constants in a lookup table to use. Each pixel can choose to use one of the 4 colors to modify the chunk's base color. Produces nice results and compresses well!
Instead of fetching from an image file, you can write a shader to generate a color (based on UV values if you want). Graphics cards have optimized for image texture fetching and reading, so avoiding that can be a costly decision for real time applications.
Nice use cases are for volumetric textures (like a carved marble statue) and dynamic effects like water ripples.
Textures can be animated by either rendering video data or changing UV values per frame. Or even blend textures over time thus having a texture change.
A fancy term to mean using texture/image data to modify how a material is processed, thereby allowing a single material to produce slightly different results over the same mesh.
A common case is a diffuse map meaning the main color of a mesh that's using a typical shading equation.
Giving a texture alpha values so that parts of it render transparently.
This can be used to put a decal on top of another mesh. Or creating very inexpensive detailed illusions like a bush with hundreds of branches and leaves that's really just a picture - render a single polygon plane with the texture on it. This illusion fails when the camera is rotated, but you can add an intersecting plane with the same texture to help keep the illusion a teeny bit. This is called a cross tree. Bilboarding can further fix this, but that's discussed later.
Alpha Blending is a tool to allow rendering of objects with partial transparency, but it's a teeny bit costly since fully transparent pixels still go through the merging pipeline and are not discarded though they will not have any visible effect.
Alpha Testing is when a pixel is discarded during the merging stage because its alpha value is below some threshold value. Apparently the pixel's alpha value can then only be rendered as either fully opaque or fully transparent, which can lead to artifacts. Further apparently, this can be mitigated by computing the alpha map (thus allowing a range of alpha values) as a distance field which is discussed later.
Alpha to coverage or transparency adaptive anti-aliasing is a technique where a translucent (partially transparent) pixel is converted into a fully opaque pixel, but fewer samples are used (huh??? How does this help anyhing). Because the pixel is now opaque it will correctly obscure objects behind it.
This is a catchall phrase for techniques to create illusions that a mesh has a more detailed surface than just an image can provide, but less detailed than adding raw geometry.
Detail can be broken down into 3 scales - macro, meso, and micro. Macro is the 3D modeling part, the actual geometry. Here the goal is to create recognizable silhouettes and shapes. Micro describes shaders and specific lighting details, like how skin can do subsurface scattering or a polished piece of metal creates fine reflections.
Meso detail is anything in between and describes effects like facial wrinkles or the bumpy surface of brick and stone. The detail modifies data in the pixel shader.
Blinn introduced the idea of normal maps - creating an image that will modify a face's normal at a pixel level, allowing a per-pixel normal value which will affect lighting equations that (almost all) use the face's normal to determine factors like brightness, color, and shine/specular.
So we are modifying a surface's normal by some amount based on a normal map. While we could define the modification based on the model space, it simplifies things by instead modifying the normal by the surface's own surface-space. This is called the tangent space basis (a very fancy math term to mean a 3D space like model space). This is calculated per vertex and is stored as…
Tangent and bitangent vectors are values assigned to a vertex, much like a vertex normal, and determine the orientation of a normal map on a mesh's surface. Tangent means orthogonal to the normal, and bitangent is just a complementary vector that's also orthogonal to the normal but not necessarily orthogonal to the first tangent vector. Thus, to save space, sometimes the normal vector is omitted and then calculated per vertex in a shader. But this fails when the normal map is used on a symmetric mesh so that sometimes the normal is pointed opposite though the same normal map is used, a solution is to store the handedness of the vertex which is then used to orient the calculated normal correctly.
Offset Map or Offset Vector Bump Map was created by Blinn and stores 2 signed values at each texel location. One to modify a normal by the texture's U direction, and another to modify a normal by the V direction (using tangent space basis). The result is a non-normalized vector pointing somewhere slightly else than the surface's normal usually would.
A Heightfield is a greyscale texture where white represents high and black represents low. Deviations to the normal are calculated by examining how the U direction of texels change and the same in the V direction. This is a common procedure and is referred to as the image's derivative since it's a result of how the image changes. There are a variety of algorithms and filters you can use to calculate the normal offset from a heightmap.
These methods have generally been replaced by normal maps.
Older systems call it dot product bump mapping.
Normal maps are a pre-computed form of bump map, like described in the section above. They give a surface normal value per pixel instead of calculating it based on some offset values.
The change occurred when data storage became bigger, so saving 3 vector values wasn't a concern anymore. Plus the savings in per pixel computation is good.
The texture stores a value between [-1, 1] as a color, so [0, 255] with 128 mapping to 0. This is why a light blue color indicates no deviation of the normal - color (128, 128, 255) maps to a vector of (0, 0, 1) meaning straight up!
So. As discussed in the previous section, normals are stored in tangent space (relative to the surface). We can either convert an incoming light per vertex to this tangent space, or we can convert the tangent space normal to world space. It's standard to convert to world space because it quickly becomes unwieldy once a scene has many lights and surface-based reflections need to be in world space anyway.
Normal maps will always fail when looked at from a close angle, because the ridges and grooves don't actually distort mesh. For example, the mortar between bricks in a wall will always be visible even though at shallow angles the bricks should stick out and generally hide it.
A technique to give surfaces the illusion of occulling (hiding) parts of itself even though the geometry of the mesh isn't detailed enough to hide it.
The height of the texture is stored in a heightfield. The values will offset the texture coordinates for other texture data fetches, resulting in a different part of the texture being used than normal!
The calculation to offset the texture takes the returned height value and the view direction into account which needs to be in tangent space. The heightfield value can also be scaled and given a bias (a value added to every height value).
Again, there's a problem with shallow angles where a small change in the view direction can result in unwanted, big changes in texture coordinates. Also a new problem, stereoscopic rendering will often not work well since the calculation may return inconsistent depth/offset values for the same point (because of the 2 different view direction angles).
Given an original texture coordinate p, adjusted (scaled and biased) hieghtvalue h, and view vector (transformed into tangent space) v, the equation is: p = p + h × Vxy
That equation is technically called parallax mapping with offset limiting, because an earlier equation divided by the view's height (z) which caused erratic sampling at shallow angles and the new equation as shown above limits the amount of offset that can occur. As you can imagine, dividing by a number less than 1 will cause extreme scaling.
Parallax mapping is cheap and works amazingly well and has become standard in real time rendering. However, the mapping equation tends to overcompensate (adjust the UVs too far) and fails for heightmaps where there are rapid, large differences between nearby texels.
Given a heightfield texture, the view ray is projected onto the texture surface as a line, and samples of the heightfield at regular intervals are taken along this line. Then, a new line is formed connecting these sampled heights where the height pushes the line above the texture. The intersection that's closest to the view ray is then used as the new UV sampling point. The calculated UV value is not one of the heightfield samples from earlier, but possibly somewhere in-between.
More view ray intersection points are generated for grazing angles to help avoid flickering issues or incorrect samples.
Also called Parallax Occlusion Mapping (POM) or Steep Parallax Mapping.
The heightfield is sometimes used as a depthfield where the highest point is the mesh's surface and the texture describes how deep it goes instead of how high.
Normals are usually still provided via normal mapping, but to save memory they can be calculated using the heightmap in a way like this.
Relief Mapping has issues where the mesh ends, but the visual effect implies there should be more pixels rendered to the screen. This happens because fragments are only generated for where the mesh is on screen, and relief maps give the illusion of a more complex mesh. Shell Mapping extrudes the mesh so that more fragments are generated (the extruded mesh is called a shell).
Another issue is efficiency when sampling a heightmap texture with large unchanging sections that would be nice to skip over sampling wise. Some algorithms include cone step mapping and quadtree relief mapping.
Aka Displacement Mapping using a Displacement Texture. The displacement texture is the same as a heightmap.
Here, the work is done in the vertex shader instead! Vertices are modified based on the texture. I assume you would provide essentially a flat plane that would normally be one triangle, but is broken up into many triangles that all span a heightmap.
It used to be the case that accessing texture data from the vertex shader was inefficient and ill-advised, but the move to “unified shaders” and general hardware improvements have made this far less an issue and it's now a suitable method for adding detail to a mesh via textures.
The biggest being collision detection, since collision engines will not know about the deformed mesh, it can cause physical objects to interact with it strangely. Like a ball rolling smoothly along a cobblestone road, seeming to ignore any large stones and gaps.
Similarly, animations can be tricky to build.
Blinn, Phong, Lambertian… these are all old school lighting concepts. Some are from the 1970s! We can do more, do better!
The chapter starts off with two HEAVY chapters on physics of lights, physics of colors, and how computer monitors simulate colors. It's a very heavy intense read that, tbh, did not mean much to me. It's important under certain circumstances, but I only care about modern 3D rendering tools and terminology, not how to build a monitor. Physically Based Rendering (PBR) will depend on physically based models of lighting, but we can obscure over the details. Unless we want to build from the ground up a PBR engine and totally comprehend what the equations do and why.
For now, some terms and definitions
Stands for bidirectional reflectance distribution function. Is simply just a function that tries to more physically replicate how light works.
It examines the ratio of outgoing light over incoming light, where the exact values of incoming light is based on the light's direction to the surface and the outgoing light’s reflected direction.
The BDRF is usually written as so
Where the inner function is
This inner function returns an rgb color that is the result of some light L over the differential of radiance E times cos the angle of incidence of the light (clamped to 0 to prevent lights from behind lighting the surface).
This is known as the modern lighting equation. For each light in the scene, calculate one light's color (the inner function) and piece wise vector multiply it with the light's radiance color and cosine clamped to 0 again. Then you sum up this calculation for each light hitting it.
BDRF is a function of 4 scalars - 2 each for incoming direction and outgoing direction. Each has an angle above the surface and azimuth or angle about the normal.
In the end, the BDRF is a complicated equation. Attempts to simplify complexity have been made to create variations on BDRF, like a Lambertian BDRF model. Different models have different lighting properties and advantages.
The book continues to thoroughly discuss concepts of reflectance and micro geometry and microfacets and attempts to create equations that model reality. It is very math heavy and worth reading if you want to implement your own BDRF shaders from scratch.
I skipped over the section on non-point lights i.e. area lights because it's basically more of the same thing, just rounding out the complexity by taking away the point light simplification.
In BRDF, there are 2 phases - calculating the incoming light and calculating how that interacts with the material to produce the outgoing light.
These 2 phases need to be done for each light, for each pixel! And if light sources are different types (directional versus area) and if materials are different (metal simulation versus translucent stained glass) then a shader needs to be made for each and every combination!!! Half-Life 2 had 1920 unique combinations!! Wow!
Looping over lights in a pixel shader, where light types can change dynamically, is too inefficient. Manually writing thousands of shaders is impossible too. The solution is to write one large shader file called an übershader that can be compiled selectively (with preprocessor tools) to produce the exact shaders needed!
Another solution is to use multipass lighting where every object in the scene is rendered once using each light. New shaders only need to be written for each light type, but still one per material.
Deferred Shading is based using render targets and perform all visibility testing before any lighting. The idea is to, in a single pass, output all the shading pixel data an objects needs to be rendered. The depth data (z buffer), normals, texture coordinates, and material parameters are all output as separate render targets, called G-buffers (Geometry buffers). Next, passes are made that render the lights, using the G-buffers’ data! A full screen width quad is used to ensure the entire window is drawn to. Multiple lighting shaders can be used, using the same G-buffer data and their results can be mixed together.
There are many benefits to deferred shading over other attempts to make physically based rendering! Vertex programs are only run once, no need to shuffle geometry data and interpolate it repeatedly, no need to pre-determine which lights affect the scene. Best of all, drops the number of unique shaders down. You only need a shader per material (to generate the G-buffers) and one per light type. This also means it's much easier to play with new programs!
The downsides are that multiple render targets eats bandwidth and fill rate quickly for large, complex tasks. Even still, deferred rendering works well!
Ah FINALLY, the chapter where we examine the problem that has been interesting me for YEARS.
We know how to render an object in a simple way - incoming light(s) bounce off the surface and result in a color on the screen.
But how do we handle that definitive PBR (Physically Based Rendering) look where, clearly, light can bounce off objects and then illuminate others, the quality of ambient lighting is so good! How do you do this?
The book describes the problem as a recursive one without end. The color resulting in our render is the sum of all incoming rays of lights hitting a point on a surface, and some of those rays are coming from other surfaces, whose exiting light ray needs to calculated the same way… from rays of light calculated the same way and so on.
This problem will be solved throughout the chapter in a variety of ways, each taking a look at specific techniques to simulate effects that a truly infinite light bounce would do like ambient occlusion and environment reflections.
A researcher credited their name to a notation scheme that works like regular expressions, but with rendering models.
A model can be represented as a series of steps where Light hits some objects then finally enters the Eye. The notation is read left to right.
The classic rendering model is represented as L(D|S)E. Light hits a diffuse or specular surface, then enters the eye. This shows that for anything to be seen, a surface must be hit by a light, lights themselves do not show up. If lights were handled by the clinic model, then the notation becomes L(D|S)?E where the question mark indicates there is either 0 or 1 surfaces hit by a light before being rendered. The 0 surfaces means light directly enters the eye. If you think about it, in classic models, there is no rendering of a raw light source itself. Only objects with geometry are rendered.
This notation simply provides a way to quickly summarize lighting models and how they work. Returning to our problem, it can be written as L(D|S)*E.
First some definitions
And some physical concepts
This refers to objects casting shadows onto a flat plane. The process is straightforward - use a matrix to transform the occluder onto the 2D plane surface. To make sure the shadow is always drawn on top of the receiver, either give the shadow a bias (translate it a bit above the surface) or disable the z buffer while rendering the shadow.
If you have translucent shadows, you can end up with the translucent shadows being drawn on top of each other which is unrealistic. One fix is to use a stencil buffer. Render the occluder to the stencil buffer, in preventing the stencil value by 1, then draw the shadow only when the stencil buffer has a value of 1, forcing each shadow pixel to only be drawn once.
Imagine rendering a scene from the point of view of a light source. Whatever the light can see is lit, everything else is in shadow. Now, instead of rendering a scene, occluders are rendered as black colors onto a white texture. This texture is called a shadow map or shadow texture.
Since it's a texture, it can be used to wrap curved objects quite well! The downside is objects have to be marked as a receiver or a caster in the application/engine, casters cannot cast shadows on themselves, and the map has to be updated every time something moves in the light.
Aka volumetric shadows. When a point light hits an occluder, simplified down to a single triangle, the volume hits makes a 3D, 3-sided pyramid. Everything below the occluder triangle is still a pyramid, but now makes up the occluder's shadow!
When a ray from the camera is cast through the scene, each time it passes through a front face of a shadow volume it's considered in shadow, and when it leaves through a backface it's out of shadow! Pretty nifty.
The downside is that pixels have to be drawn many, many times through many passes, generally only allows for hard shadows, and doesn't handle translucency well.
The scene is rendered from the point if view of a light. Only depths are recorded (only the z buffer), all else is turned off. Shadows can then be rendered by determining if the pixel is further away from the value in the shadow map. The shadow map basically contains z values that represent the closest object it encountered, and thus the thing occluding the light. If a pixel is behind an occluder (has a greater depth) then it must be in shadow.
The issues with this approach are that the depth buffer only has a certain amount of precision, causing shadow artifacts to appear when the light is far away compared to the viewer. The fix being an algorithm called LiSPSM which is a complex algorithm to transform the light's projection matrix to kinda match the view's so they both share a similar sampling pixel density.
I'm noticing an unspoken idea in the book - shadow maps are generally made for just one light, a single directional light like a moon or sun. I guess whatever algorithm is used, if multiple lights cast shadows then the algorithm has to be run for each?..
Cascade Shadow Mapping (CSM) is a technique to use multiple shadow maps at various densities (further from the camera, the lower the texel density) and have objects use different maps based on how far they are from the camera!
Creates soft shadows by sampling shadow map results around a point instead of just a single location! The samples (in shadow or not) are interpolated together, with the more samples meaning more granularity and softness.
The downside here is a large amount of time that it takes to perform all the samples calculations and a few other artifacts and sub calculations have to be accounted for at each possible shadowed pixel.
Uses math to calculate a difference between occluder and receiver that allows the depth map (shadow map) to be squared, filtered, and otherwise used like a texture so it's much more optimized for rendering and produces great results! This is a good tool for environment/terrain shadowing.
Main downside is artifacts where light will “leak” through a shadow because an occluder is shadowing another shadow.
Shadows need to fill cracks and little spaces of mesh to provide realism. Ambient light is general, directionless light that just exists. Ambient Occlusion is shadowing that is caused by this light.
The theory for shading this is examining how visible a point on a mesh is to the outside world. A section in a crevice or corner will be very occluded so should be darker than a point on the outside of the mesh. The more ways a point can receive light from the world, without occlusion, the brighter and less affected by ambient occlusion it is.
This works well for models, but won't work for scenes with multiple separate objects. In that case, we can do ray casts and see how far away other intersections are. If they're far away, no or low occlusion! Taking the casts to infinity will always result in occlusion for enclosed spaces, so a maximum ray length is needed.
True ambient occlusion is an unsolvable infinitely recursive problem because light bounces so many places and outgoing radiance can't be calculated until other radiance are calculated which can't be calculated until…
An actual implementation technique uses the z buffer after initially rendering the scene. For each pixel, nearby depths are sampled randomly around it in a sphere. The ratio of hidden to visible samples determines the occlusion. Although good results need 200+ samples, which is too much to process, a solution is to use at most 16 samples and then blur the result. Another solution is to use an unsharp mask algorithm on the z buffer.
Reflections are images of a scene, mirrored about the reflective geometry. A straightforward and practical solution is to mirror the entire scene's geometry, render it, and use that as a texture to render the mirror. Similarly, you can render the scene from the camera's mirrored point of view and use that as the mirror texture. Both achieve the same effect. It's recommended to use the stencil buffer so that only geometry with the mirror material render the mirrored geometry.
An issue of geometry that cuts through a reflective plane (like a rock sticking out of water) will cause the rock geometry below the water to mirror entirely so that the part under water now appears above it, which is incorrect. A fix is to use a custom clipping plane (somehow) right at the reflective plane that will ensure objects below it do not get rendered.
Once you have the reflected environment map as a texture/image, you can do so many techniques on top of it like using bumpmap techniques for rippled water or textured/frosted glass!
For non-plane geometry, like a sphere or creature, the most accurate technique is to use ray tracing (discussed later) or render environment maps (EM) each frame, using the previous frame’s EM to make the current one. This produces a nearly recursive EM, which is good and accurate.
Fancy word for a more physical/real take on translucency. The concept is to have the material act more like a filter - incoming rays only output certain amounts of colors along the light spectrum. Thicker parts of the mesh should filter more light than thinner parts as well.
Beer-Lambert Law provides a math formula to derive how much Filtering based on the model's thickness, and someone named Bavoil introduced a simplification to that equation for real time rendering. All that really needs to happen in the shader is calculate how thick the current view ray (per pixel) is through the model. This can be done by first rendering the back faces to the depth buffer, then reading those values when rendering again like normal.
The effect of light being curved when it changes medium, like how a straw in a cup of water will look bent.
Dispersion is a property where different wavelengths are bent by different amounts. This is how prisms form the iconic rainbow effect. To replicate this would mean managing many rays of light being made from a single light ray, which is too much to ask for.
Back to refraction...
One solution is to render a cubemap environment map at the location of the refraction model, then refer to that map when the model itself is rendered.
This doesn't account for backfaces, which is usually fine anyway. A technique using ray tracing considers backfaces and total internal reflection if desired.
A fancy term for when light is focused into specific spots or designs, like through a magnifying glass or circular designs from a cup of water.
An image based solution works like a shadow map. The scene is rendered from the point of view of the light and whenever it passes through a reflective or refractive object and the light bends, that location is marked. The result is called the photon buffer. Spheres representing the accumulated light, called splats, are generated from the photon buffer. The splats are rendered in a second pass, transformed into the camera's point of view, into a caustic map, which is rendered in the final 3rd pass
Other physically based algorithms exist like caustic volumes, but sometimes a simple hack provides visually pleasing results. Like a caustic texture being projected onto the scene. A special algorithm for water/sea fanatics is to ray trace from the sea floor up to the surface where the ray is then bent. The more bent the ray, the less light drawn at that point on the floor.
Subsurface scattering refers to when light changes direction after entering a physical object before it is emitted from that object. When the light doesn't change direction too much, like a creature's skin, it's handled by lighting equations like BRDF. Global means the light has traveled a longer distance, more than a single pixel.
Physically, light is scattered differently depending on its wavelength. For example, air scatters blue more than red light, giving the appearance of blue sky.
When light is scattered, it can either bounce once inside the material, or many times. Rendering algorithms are grouped up by single bounce or multi-bounce methods. Multiple scattering leads to more pleasing results, though. As light is scattered, more of its energy is lost (absorbed) into the material.
A technique where light is wrapped around a curved object, softening the transition from light to dark. Changing the hue of the light, like more red for white/light skinned humans, works well here.
This comes from the observation that specular reflectance does not change with subsurface scattering and only cares about the normal map. Diffuse lighting (diffuse reflectance) can be used to simulate subsurface scattering by having it ignore the normal map, or use a blurred version of it.
A model's diffuse color is rendered to a texture, then is blurred multiple times using different color filters. The results are then combined during a final shading. This technique is expensive because of all the rendering.
This section covers algorithms that combine all of the thoughts on Global Illumination so far. Algorithms that simulate a full, physically based, rendered scene. Typically it's a bit too much work to do these in real time, but the algorithms are instead often used against the static content in a scene, compiled and saved before the game runs, and then referred to at run time during real time.
Named after the first program used to demo the technology. It's a process of computing ambient/global light by allowing all diffuse surfaces to send out light into the scene. This way, light truly bounces off of surfaces as in real life to light up the entire scene.
The algorithm is to first generate patches on top of generalized spots on the surfaces in the scene. Could be verticies or polygons or whatever. Each patch is a hemisphere which details how much light travels from this patch to one other patch. A form factor value is calculated for each patch to patch, and denotes how much relative light travels between them. Distance between patches, if anything's blocking them, and which way they face are all considered in the calculation.
Handling infinite patch emissions is impossible, instead you cut it off at some point. One solution is to first have the lights shoot out to the patches and whichever patch is brightest then will be the next to shoot out light. Now that patch shoots light and whichever other patch is brightest takes its turn. This repeats some number of times and significantly reduces the number of patches to process while focusing on the most important (brightest) patches.
This algorithm is still reserved for offline rendering mostly, though it has seen real time use.
Ray tracing is exactly what it sounds like, and so far the book has only provided a brief introduction to the concept, not so much how to actually implement it on hardware.
Rays are shot out from the camera/eye, through each pixel to be rendered to the screen, and the rays of light calculate color data as they go. Color data is calculated using a variety of techniques discussed early and generating new rays pointing to each light the object is lit by. Bouncing from one material to the next. Reflecting and refracting causes new rays to be generated as the process goes on until we decide we’re done bouncing. As the ray travels along, it checks to see if anything is hiding it from light sources (causing it to be in shadow) or if the thing obscuring it is translucent, causing the shadow to be colored light instead.
The upside to this technique is that it produces high quality physically-based visuals! It works on a variety of primitive types, not just triangles (e.g. curves).
There are several big challenges to this technique. The processing of all these rays doesn’t work in traditional GPU hardware accelerators - OpenGL and DirectX. They operate on the notion of processing meshes/triangles, producing fragments, then rendering some output data based on available information. Nothing about rays, handling object collision, reflection and refraction… Regardless, the parallel nature of GPUs do make them prime ground for processing this kind of data! Rays can be batched up into similar directions and processed at the same time as such (Disney did a presentation on this, describing that’s how their renderer for Zootopia partially worked).
Irradience Caching or Photon Mapping. The scene is rendered from the view of a light, saving how much light hits a surface. Next, surfaces examine nearby surfaces and store their indirect light received. Finally, a ray tracer does its work, calculating light using the light sources and these photon maps.
These techniques are slow because they take so many calculations and, as of the writing of this 3rd edition, GPUs haven't quite become adapted to processing them.
A solution is to compute this heavy, fancy, realistic lighting using the static geometry and static lights of a scene first and then referring to those results when rendering.
An old school technique used back in Quake II days. Static lights are rendered into the scene and the geometry stores the light data indicating the light hitting it. The data stored can either be a light map texture that will then be applied to the geometry or even have the data stored per vertex.
Downsides is the lights can't change, doesn't work with any dynamic objects, heavy memory storage issues, and the light has no directionality component after its saved.
The concept is simple - add directional data to the stored prelight data (kinda introduced above). The solutions are quite complicated though. There are a variety of techniques that focus on how to store the data so it can be used during rendering.
One technique focuses on storing the data like a normal map, but they are all very math heavy and I cannot simplify them. Look into techniques like:
This is the technique of generating ambient light (indirect light) environment maps at specific points in the scene and then letting the dynamic objects reference them to give dynamic objects ambient light as well. The challenge becomes interpolating between the environment maps as the objects move.
Just as lighting in a scene can be precomputed, so can information about how obscured parts of the scene are.
Focusing mostly on static geometry and static lights, a scene is processed for ambient occlusion, as described above, and results are stored. Possibly per vertex, or as textures.
Moving objects can take advantage of this too by computing AO (Ambient Occlusion) in a cube and saving the results as a cubemap. The map describes how the object affects nearby objects.
Animated objects take more work because they have different poses. One solution is to generate AO for a variety of important shapes and interpolate between maps or sample whichever is closest to the current pose.
Additional work that goes into preserving a directional aspect to the ambient occlusion. Does good work with heightmaps that can produce nice soft shadows.
This is a topic on how to precalculate work for everything else global illumination - reflection, refraction, subsurface scattering, etc. The name comes from how iradiance transfers into radiance (incoming light to outgoing light)
Skipped over because it's very math intense and I'm getting tired of physically base algorithms.
This section essentially covers everything that cannot be modeled well with polygons - e.g. fur and clouds. Also topics on post processing and rendering a final image to certain, non-photorealistic styles.
By simplifying to having a fixed view (the camera doesn't move), a lot of rendering can be saved and reused. Once rendered to a buffer, and resupplied per frame, any graphics buffer (G-Buffer) can be reused unless the camera moves. This can allow for techniques like rendering a complex scene/mesh once and then rendering tools and UI on top to interact with it. This is what Computer Aided Design (CAD) tools do, allowing for measuring, annotations, and other things. Another use would be texture painting - painting right on to a 3D model.
Golden Thread or Adaptive Refinement or Progressive Refinement are techniques where a scene is initially rendered ar a quick, lower quality then re-rendered at higher qualities unless the camera moves. The more detailed renders may use slower techniques and just focus on parts of the full display image at a time. These refinements are swapped in any way wanted, either instantly or faded in slowly.
A cubemap texture is used to represent distant/far objects like the sun, mountains, clouds, etc. It's fast, efficient, and adds a lot of detail to a scene. The texture should be fairly high resolution to avoid artifacts.
In practice, the skybox is rendered using a cube mesh surrounding the scene, or a dome. Team Fortress 2 used a dynamic skybox system where a separate skybox environment map was rendered out of view of the player, then rendered as a skybox.
The concept of taking MANY real life photos of a single object or scene. The images can then be processed to create a 3D model just from the images or a holographic like effect where the object/scene can be rotated around as though it were a 3D rendered scene! As of this book’s edition, there are not many realtime applications of this effect.
Light Field Rendering itself is using interpolation techniques to render new, computed, views of an object that a camera hasn't captured.
A sprite is a generalized term for any 2D image displayed onto a screen. A sprite could be a plane that's always view facing or rendered using some other mesh. Usually the sprite has some transparent parts to it so it's not just a rectangle.
A layer is a set depth on which many sprites exist on. All sprites on a single layer are rendered at the same time, and layers are rendered back to front. This allows for nice parallax (distant objects move less than nearer objects as the camera moves horizontally) effects by moving the distant layer less than the closer one. Also easy zoom effects, by scaling the closer layers more than further ones.
A billboard is a sprite that always faces the viewer. The 2D sprite rotates as the camera rotates.
Screen-aligned billboards strictly always face the screen. These work well for things like text or purely 2D games.
Viewport aligned billboards rotate a little, but generally face towards the camera. Technically they face the viewport, the view frustum. This is to purposefully create foreshortening and perspective effects with the sprites. May be a good option for imposters (images standing in for actual 3D mesh).
Some advanced techniques of billboards involve using animated explosion sprites and repeating them in many ways, randomly, to simulate a real explosion cloud. Relief mapping can help make it look even more 3D. Clouds also can be made the same way, repeating cloud billboards and varying their transparency and transforms.
Soft particles are a billboard technique where translucent billboards dynamically change their transparency per pixel - more transparent the closer it is to other objects. This solves problems where billboards intersect nearby geometry and kill the illusion that it's a volume effect like dust or smoke.
Another artifact of billboards is when they suddenly pop out of existence as the camera moves, or characters pop in front or behind of them. A solution to this is to treat the area where the billboard is taking effect as a volume and render the billboard more or less transparent as needed and always on top of objects until they've fully left the volume.
Axial Billboards have a set 3D axis they're only allowed to rotate along, but from there they always try to face the viewer. These can be used to create beam particle effects where the beam trail is the axis on which the beam billboard rotates.
Basically any system where really small, tiny primitives are generated en mass and given some behavior. Possible to render as an axial billboard for thicker lines.
Imposters are Sprite substitutes for 3D geometry. Either the object is substituted with a premade sprite, or rendered in real time to a texture. Rendering on the fly works better overall since you can generate a texture size to best fit the screen display and have the object oriented precisely.
Billboard clouds are collections of billboards used to represent one object or effect. Like a tree where branches and leaves or pine needles can be rendered each as a single billboard and their collection is referred to as a cloud.
Billboards and sprites can include depth information too, called depthsprite or nailboard. These can offset depth values at the pixel level when rendering the Sprite, which can really help fix some issues when the sprite is an imposter of a 3D object, or simply want faster rendering times with particles.
Post processing covers most any technique where an input image is worked on, modified, and some altered output is resulted. It differentiates itself from everything else so far in that geometry is no longer the main source input.
Because of how GPUs still work, it's necessary to render a plane that covers the entire screen in order to trigger the pixel shader to run across the whole screen/image. It's silly, but also leads to a potential trial optimization where the screen can be segmented into tiles so only tiles that need processing are processed.
Aka filters, Kernels are a way to calculate a per-pixel value by summing and weighting neighboring pixels by some amount. Gaussian and Sobel are names of popular kernels with specific weight values. For example, when a Gaussian kernel is applied to an image, it blurs the image.
Kernels have a set size, like 3x3 or 5x5 grid sizes. The weight values can be positive or negative. The center of the kernel represents the current pixel, with nearby pixels represented as one grid row/column over. Each pixel is multiplied by the weight value then summed up. This value is saved for the current pixel, and the next pixel is processed.
Bilateral filters use the neighboring pixels and the value of the pixels in the weighting process. This technique apparently has the ability to preserve edges (sharp changes in pixel color, usually a white on black) which most filters tend to blur the most.
A handy technique for computer vision algorithms, but also leads to making cool effects where objects are outlined like they were inked in a comic. First, an image is blurred only along the horizontal direction, then a differential filter is run (comparing differences between neighboring pixels). Big changes mean an edge! It's that simple.
An image is redrawn or rendered with a different luminance (brightness) scaling across the image. Good lighting is so important for a render. Too bright overall and it looks washed out, too dark nothing is seen. The goal is to use contrasts.
Inspired by how human vision adjusts based on the level of light around them, the goal is to remap the lightness of pixels in an image to fit a certain goal, so dark image will be brightened and a bright image will be dimmed while preserving contrasts ideally.
High Dynamic Range (HDR) is
How to use the stencil buffer! Makes like a global rendering mask so you can additionally test fragments against it to see if they should be drawn.
Unfortunately, writing to the stencil buffer also means writing to the color buffer. By default you can't write to just the stencil buffer. You could output a purely transparent color, but that's inefficient. Better to output a flat color and make sure it's overwritten later, or output a neutral/dark color so it's not noticeable.
Writing to the stencil buffer is similar to writing to the color buffer in that geometry is needed to generate fragments indicating which “pixels” in the stencil buffer will be written to. The difference is it's not programmable like the fragment shader, instead you have a few options to play with.
Basic flow per render/effect:
At some point you will want to clear the stencil buffer between complete frames.
Use glEnable and glStencilFunc to setup the stencil test before rendering. This tells which comparison to use and gives a hard value (called the reference value) to compare stencil buffer values to. Also pass in a bit mask that is AND with the reference value and the current stencil buffer value before comparing. For example, using equals comparison, a reference value of 1, and a mask of 0xFF results in a stencil mask where only stencil values of 1 are considered passing.
Use glStencilOp to allow updating of the buffer. Tell it what to do when the stencil test fails, if the stencil passes but depth test fails, or both tests pass. There are several options like keep (no change), replace (with the reference value that's also used for the test), increment or decrement by 1, increment or decrement by 1 with wrapping.
Optionally, you can provide a bit mask via glStencilMask that determines which bits are allowed to be written to. Only 1 bits are allowed to be written, 0 bits are write protected. By extension, a mask of 0x00 means writing is effectively turned off regardless op settings.
That's it! It's up to you to figure out how to use it and what to do with the mask.