D3D11 Vendor Hacks - Google Drive

	A	B	C	D	E	F	G	H	I	J
1	This is a list of D3D11 vendor/driver hacks, inspired by Aras's list of D3D9 GPU Hacks. Everything here is not natively available in D3D11 using FEATURE_LEVEL_11_0, which is the maximum FL supported by Win7.
2	The "support" columns indicate the minimum GPU on which you can use the listed extension(s) for that column. They do not necessarilly reflect the actual capabilities of the hardware, only the functionality that's exposed through D3D11 extensions. Note that entries in these columns with a question mark are unconfirmed, and are just my best guess at the moment. Please let me know you have information about the supported hardware, or can help confirm hardware support for particular feature.
3	This spreadsheet is maintained by Matt Pettineo (MJP) @MyNameIsMJP https://therealmjp.github.io/
4
5	========================= Rendering Hacks ==============================================
6
7	Feature	Description	NVAPI Function(s)	AGS Function(s)	IGFX Function(s)	NV Support	AMD Support	Intel Support	D3D11.x/D3D12 Support	References
8	UAV Overlap	Tells the driver to skip synchronization between draw or dispatch calls that use UAVs. Normally the driver will sync after each draw or dispatch that writes to a UAV, in order to prevent hazards when two threads try to access the same area of memory. Using this extension can allow multiple Draws/Dispatches to run in parallel on the GPU, and will also let you keep your QA team busy with difficult-to-repro sync bugs!	NvAPI_D3D11_BeginUAVOverlapEx NvAPI_D3D11_EndUAVOverlap	agsDriverExtensionsDX11_BeginUAVOverlap agsDriverExtensionsDX11_EndUAVOverlap	N/A	Fermi?	Southern Islands	None	Equivalent behavior can be obtained by omitting barriers in D3D12	https://gpuopen-librariesandsdks.github.io/ags/group__dx11misc.html#ga16f7cfc4d3c436b211f299341e25c801 https://gpuopen-librariesandsdks.github.io/ags/group__dx11misc.html#gae22fcecf7799dfd5aae4bfd308e6444e http://docs.nvidia.com/gameworks/content/gameworkslibrary/coresdk/nvapi/group__dx.html#gac3f34cbd997bdb51478ada50255a9dd7 http://docs.nvidia.com/gameworks/content/gameworkslibrary/coresdk/nvapi/group__dx.html#gaeb78a97e256f3c6c511451dded3994e5 http://docs.nvidia.com/gameworks/content/gameworkslibrary/coresdk/nvapi/group__dx.html#gaed6aaa526d5a4729d9524039eae4c825
9	Depth Bounds Test	Rejects all pixels whose depth falls outside of a range specified by a minimum and maximum depth. Originally developed for accelerating stencil shadows, but can also be used when accumulating deferred lights or projective decals.	NvAPI_D3D11_SetDepthBoundsTest	agsDriverExtensionsDX11_SetDepthBounds	N/A	Fermi?	Southern Islands	None	Optional support as of Windows 10 version 1703 (AKA Creator's Update). NVAPI also has a D3D12 extension.	https://gpuopen-librariesandsdks.github.io/ags/group__dx11misc.html#gaf1635db8ecaaefa20b4950a9191fdcb6 http://docs.nvidia.com/gameworks/content/gameworkslibrary/coresdk/nvapi/group__dx.html#ga0502f9d58555b662a3b6fcc9b61b7d2a
10	Forced MSAA Sample Count	Forces a specified MSAA sample count regardless of the render targets and depth targets bound. Can be used to implement MSAA variants that don't require the full storage and bandwidth cost of MSAA render targets.	NvAPI_D3D11_CreateRasterizerState NvAPI_D3D11_RASTERIZER_DESC_EX::ForcedSampleCount	N/A	N/A	Fermi?	None	None	Target independent rasterization provides equivalent functionality, available in D3D11.1 and D3D12 with FL 11_1	http://developer.download.nvidia.com/assets/events/GDC15/GEFORCE/Maxwell_Archictecture_GDC15.pdf http://docs.nvidia.com/gameworks/content/gameworkslibrary/coresdk/nvapi/group__dx.html#gaa05c2e42fdf9f7ead12acb291f1b9444
11	Programmable MSAA Sample Positions	Allows specifying the location of MSAA sample points within a pixel. Can be used to implement interleaved sampling, jittered sampling, or poor man's decoupled shading.	NvAPI_D3D11_CreateRasterizerState NvAPI_D3D11_RASTERIZER_DESC_EX::ProgrammableSamplePositionsEnable NvAPI_D3D11_RASTERIZER_DESC_EX::InterleavedSamplingEnable NvAPI_D3D11_RASTERIZER_DESC_EX::SamplePositionsX NvAPI_D3D11_RASTERIZER_DESC_EX::SamplePositionsY	N/A	N/A	Maxwell 2.0	None	None	Optional support with multiple tiers as of Windows 10 version 1703 (AKA Creator's Update). NVAPI has a D3D12 extension.	https://mynameismjp.wordpress.com/2015/09/13/programmable-sample-points/ http://www.geforce.com/hardware/technology/mfaa/technology https://developer.nvidia.com/sites/default/files/akamai/opengl/specs/GL_NV_sample_locations.txt http://developer.download.nvidia.com/assets/events/GDC15/GEFORCE/Maxwell_Archictecture_GDC15.pdf http://docs.nvidia.com/gameworks/content/gameworkslibrary/coresdk/nvapi/group__dx.html#gaa05c2e42fdf9f7ead12acb291f1b9444
12	Conservative Rasterization	Causes a pixel to be shaded if any part of the pixel is covered by a primitive, instead of only testing at 1 or more sample points. Useful for voxelization, occlusion culling, analytical antialiasing, or tiled light binning. Note that using this will typically result in vertex attributes being extrapolated past triangle edges, since they will still be interpolated to the pixel center before shading.	NvAPI_D3D11_CreateRasterizerState NvAPI_D3D11_RASTERIZER_DESC_EX::ConservativeRasterEnable	N/A	N/A	Maxwell 2.0	None	None	Optional support with multiple tiers in D3D12 and D3D11.3 with FL 12_1	https://developer.nvidia.com/content/dont-be-conservative-conservative-rasterization http://developer.download.nvidia.com/assets/events/GDC15/GEFORCE/Maxwell_Archictecture_GDC15.pdf http://docs.nvidia.com/gameworks/content/gameworkslibrary/coresdk/nvapi/group__dx.html#gaa05c2e42fdf9f7ead12acb291f1b9444
13	Quad Filling	Causes all pixels within a triangle's screen-space AABB to be shaded. Can also enable a mode where the entire viewport is shaded (Nvidia only). For AMD vertex attributes are not properly interpolated, so only SV_Position will be valid.	NvAPI_D3D11_CreateRasterizerState NvAPI_D3D11_RASTERIZER_DESC_EX::QuadFillMode	agsDriverExtensionsDX11_IASetPrimitiveTopology	N/A	Maxwell 2.0	Southern Islands	None	None. (NVAPI has a D3D12 extension)	http://developer.download.nvidia.com/assets/events/GDC15/GEFORCE/Maxwell_Archictecture_GDC15.pdf https://www.opengl.org/registry/specs/NV/fill_rectangle.txt http://docs.nvidia.com/gameworks/content/gameworkslibrary/coresdk/nvapi/group__dx.html#gaa05c2e42fdf9f7ead12acb291f1b9444 https://gpuopen-librariesandsdks.github.io/ags/group__dx11misc.html#gaa5367888466f032a79c6869402282c5f
14	Post-Z Coverage	Causes SV_Coverage to reflect the active sample points after performing the depth test. Only applicable when SV_Coverage is used as an input to the PS.	NvAPI_D3D11_CreateRasterizerState NvAPI_D3D11_RASTERIZER_DESC_EX::PostZCoverageEnable	N/A	N/A	Maxwell 2.0	None	None	None. (NVAPI has a D3D12 extension)	http://developer.download.nvidia.com/assets/events/GDC15/GEFORCE/Maxwell_Archictecture_GDC15.pdf https://developer.nvidia.com/sites/default/files/akamai/opengl/specs/GL_EXT_post_depth_coverage.txt http://docs.nvidia.com/gameworks/content/gameworkslibrary/coresdk/nvapi/group__dx.html#gaa05c2e42fdf9f7ead12acb291f1b9444
15	Coverage to Color	Causes an SV_Coverage mask to be converted to a [0, 1] floating point value and multiplied with the PS output color.	NvAPI_D3D11_CreateRasterizerState NvAPI_D3D11_RASTERIZER_DESC_EX::CoverageToColorEnable NvAPI_D3D11_RASTERIZER_DESC_EX::CoverageToColorRTIndex	N/A	N/A	Maxwell 2.0	None	None	None. (NVAPI has a D3D12 extension)	http://developer.download.nvidia.com/assets/events/GDC15/GEFORCE/Maxwell_Archictecture_GDC15.pdf https://developer.nvidia.com/sites/default/files/akamai/opengl/specs/GL_NV_fragment_coverage_to_color.txt http://docs.nvidia.com/gameworks/content/gameworkslibrary/coresdk/nvapi/group__dx.html#gaa05c2e42fdf9f7ead12acb291f1b9444
16	Alias MSAA texture as a non-MSAA texture	Causes an alias of an MSAA texture that can be viewed as a non-MSAA texture in shaders that read from it. The width and height of the alias is either doubled or quadrupled depending on the MSAA mode. So a 2xMSAA alias will have 2x the width, a 4xMSAA alias will have 2x the width and 2x the height, and an 8xMSAA alias will have 4x the width and 2x the height. Possibly useful for using HW bilinear filtering when performing MSAA resolve.	NvAPI_D3D11_AliasMSAATexture2DAsNonMSAA	N/A	N/A	Fermi?	N/A	None	None. (NVAPI has a D3D12 extension)	http://docs.nvidia.com/gameworks/content/gameworkslibrary/coresdk/nvapi/group__dx.html#ga4f6364cba8cc3a6cbd45d282b413b03d
17	MultiDrawIndirect	Like DrawInstancedIndirect and DrawIndexedInstancedIndirect, except there's an additional parameter for the draw count. The GPU then loops over the draw count, and indexes into a buffer containing args for each draw. May cause your GPU's frontend to have a nervous breakdown. NOTE: Nvidia's version only supports passing a draw count from the CPU, while AMD supports both CPU-side and GPU-side draw counts.	NvAPI_D3D11_MultiDrawInstancedIndirect NvAPI_D3D11_MultiDrawIndexedInstancedIndirect	agsDriverExtensionsDX11_MultiDrawInstancedIndirect agsDriverExtensionsDX11_MultiDrawIndexedInstancedIndirect agsDriverExtensionsDX11_MultiDrawInstancedIndirectCountIndirect agsDriverExtensionsDX11_MultiDrawIndexedInstancedIndirectCountIndirect	N/A	Fermi?	Southern Islands	None	D3D12 natively supports ExecuteIndirect, which is a superset of MultiDrawIndirect functionality	https://gpuopen-librariesandsdks.github.io/ags/group__mdi.html#ga61b8abec809f1a11768d7fb9ae34ec1d https://gpuopen-librariesandsdks.github.io/ags/group__mdi.html#ga32a90d7d4e3b0f5a2fbbb8e2a6d49016 https://gpuopen-librariesandsdks.github.io/ags/group__mdi.html#gab94ccbaabcf176631416e73bdfca99e0 https://gpuopen-librariesandsdks.github.io/ags/group__mdi.html#gac1dbfb2ec7f0918450b5a02de4d058f8
18	Quad List Primitives	Enables rendering using a list of quads instead of triangles. Pretend that you're developing for the Sega Saturn!	N/A	agsDriverExtensionsDX11_IASetPrimitiveTopology	N/A	N/A	Southern Islands	None	None	https://gpuopen-librariesandsdks.github.io/ags/group__dx11misc.html#gaf1635db8ecaaefa20b4950a9191fdcb6
19	Multi-View Rendering	Allows replicating your draw calls to multiple viewports and/or render target array slices. The intended use case is stereoscopic rendering for 3D or VR, which requires drawing and rasterizing your geometry twice in the simplest case. Nvidia's version is called "single-pass stereo", and lets you specify separate post-projection X values for each eye from the vertex/geometry/domain shader. AMD's version lets you specify a viewport mask with optional clipping rectangles, and also lets you access the viewport/RT slice index in the shader.	NvAPI_D3D_SetSinglePassStereoMode NvAPI_D3D_QuerySinglePassStereoSupport	agsDriverExtensionsDX11_SetViewBroadcastMasks agsDriverExtensionsDX11_GetMaxClipRects agsDriverExtensionsDX11_SetClipRects AmdDxExtShaderIntrinsics_GetViewportIndex AmdDxExtShaderIntrinsics_GetViewportIndexPsOnly AmdDxExtShaderIntrinsics_GetRTArraySlice AmdDxExtShaderIntrinsics_GetRTArraySlicePsOnly	N/A	Pascal	Southern Islands?	None	Optional tiered support as of Windows 10 version 1703 (AKA Creator's Update). The highest tier level includes shader support for SV_ViewID, which also controls which shader stages are replicated per-view.	http://docs.nvidia.com/gameworks/content/gameworkslibrary/coresdk/nvapi/group__dx.html#ga874782dc7d22a7a946164fb2047b504f http://docs.nvidia.com/gameworks/content/gameworkslibrary/coresdk/nvapi/group__dx.html#gaf55ec1713d3d5a4a9a933f1cd020ad19 https://gpuopen-librariesandsdks.github.io/ags/group__multiview.html#gaa5f9d9b7b45d88824c03ff397036664d https://developer.nvidia.com/pascal-vr-tech
20	Modified Post-Projection W	Lets you specify coefficients that modify the post-projection W component with seperate coefficients per-viewport. The main use case is what Nvidia calls "Lens-Matched Shading", which effectively lets you taper off the rasterizatin/shading resolution towards the edges of a single view, which better matches the non-linear warping that's applied to images before displayed in a VR headset.	NvAPI_D3D_SetModifiedWMode NvAPI_D3D_QueryModifiedWSupport	N/A	N/A	Pascal	N/A	None	None. (NVAPI has a D3D12 extension)	http://docs.nvidia.com/gameworks/content/gameworkslibrary/coresdk/nvapi/group__dx.html#gab45e2704ef90ef7b1276761862e05c73 http://docs.nvidia.com/gameworks/content/gameworkslibrary/coresdk/nvapi/group__dx.html#gaa71a96da9ed91f3bcbeb156df601bbd3 https://developer.nvidia.com/pascal-vr-tech Also see the MultiProjection sample in Nvidia's VRWorks SDK
21	Late Latching	Normally constant buffers need to be updated by the CPU before issuing draw/dispatch calls, which means in practice the update happens well before the GPU actually executes the draw/dispatch. Late latching lets the CPU update the buffer just before the GPU reads from it, hence the "late" part of the name. The primary use case is for for VR, where you want to query the headet's pose and update the camera matrices as late as possible in order to reduce motion-to-photon latency.	NvAPI_D3D_CreateLateLatchObject NvAPI_D3D_QueryLateLatchSupport	N/A	N/A	Maxwell 2.0?	N/A	None	D3D12 by nature gives you significantly more control over resource updating and CPU/GPU synchronization. Threre's also the ID3D12GraphicsCommandList1::AtomicCopyBufferUINT function that's available as of Windows 10 version 1703, which lets you atomically update and and copy as a single step.	http://docs.nvidia.com/gameworks/content/gameworkslibrary/coresdk/nvapi/group__dx.html#gaf9fde46181b681155c375584e488d17f http://docs.nvidia.com/gameworks/content/gameworkslibrary/coresdk/nvapi/group__dx.html#ga34b758d8ad67c7e870d9ab14bfd78a90 Also see the LateLatch sample in Nvidia's VRWorks SDK
22	Driver Shader Compiler Control	Drivers need to JIT compile DXBC bytecode into the native ISA of the GPU before it can run a shader program. Drivers will often handle this by spawning background threads to compile the shaders asynchronously when the D3D shader object is created, and will possibly have to sync on those threads when a draw/dispatch is issued that uses the shader (which can cause hitching). These background threads can compete with the game's threads, and if a game is already creating the shaders as part of an async loading thread then you may be better off telling the driver not to spawn its own tasks.	N/A	agsDriverExtensionsDX11_SetMaxAsyncCompileThreadCount agsDriverExtensionsDX11_NumPendingAsyncCompileJobs agsDriverExtensionsDX11_SetDiskShaderCacheEnabled	N/A	N/A	Southern Islands	None	In D3D12 you're guaranteed to get compiled ISA when you create the PSO, since all of the required state is known up-front. Later versions of Windows 10 (1903 and later) also allow the driver to recompile the shaders in the background after PSO creation, which they may do as a form of profile-guided optimization.	https://gpuopen-librariesandsdks.github.io/ags/group__shadercompiler.html https://devblogs.microsoft.com/directx/background-shader-optimizations/ https://docs.microsoft.com/en-us/windows/win32/api/d3d12/nf-d3d12-id3d12device6-setbackgroundprocessingmode
23
24	========================= Shader Hacks ==============================================
25
26	Feature	Description	NVAPI Function(s)	AGS Function(s)	IGFX Function(s)	NV Support	AMD Support	Intel Support	D3D11.x/D3D12 Support	References
27	PixelSync	Enforces ordered access for UAV r/w operations based on primitive submission order. Useful for OIT, volumetric shadows, programmable blending, voxelization, and solving world hunger.	N/A	N/A	IntelExt_BeginPixelShaderOrdering IntelExt_BeginPixelShaderOrderingOnUAV	None	None	Haswell	Rasterizer Ordered Views provide equivalent functionality in D3D11.3 and D3D12 with FL 12_1	http://advances.realtimerendering.com/s2013/2013-07-23-SIGGRAPH-PixelSync.pdf https://software.intel.com/en-us/articles/programmable-blend-with-pixel-shader-ordering https://software.intel.com/en-us/blogs/2013/07/18/order-independent-transparency-approximation-with-pixel-synchronization https://software.intel.com/en-us/blogs/2013/03/27/adaptive-volumetric-shadow-maps
28	"Fast" Geometry Shader	Allows creating a "pass-through" GS that can output a mask indicating which viewports a triangle should be rasterized to. Useful for multi-resolution VR, cubemap rendering, and voxelization. Graphics programmers remain skeptical after 9 years of being let down by geometry shaders.	NvAPI_D3D11_CreateFastGeometryShader NvAPI_D3D11_CreateFastGeometryShaderExplicit	N/A	N/A	Maxwell 2.0	None	None	None. (NVAPI has a D3D12 extension)	http://developer.download.nvidia.com/assets/events/GDC15/GEFORCE/Maxwell_Archictecture_GDC15.pdf https://developer.nvidia.com/virtual-reality-development https://developer.nvidia.com/sites/default/files/akamai/gameworks/vr/GameWorks_VR_2015_Final_handouts.pdf https://developer.nvidia.com/sites/default/files/akamai/opengl/specs/GL_NV_geometry_shader_passthrough.txt https://www.opengl.org/registry/specs/NV/viewport_array2.txt Also, see Nvidia's Multi-ResVR sample included in the Gameworks VR SDK
29	Lane Shuffles	Performs a SIMD shuffle between the lanes of a warp, or a subgroup inside of a warp. Can broadcast the value from one lane to all lanes, shuffle based on a delta, or shuffle based on an XOR of the lane ID. Useful for fast reductions that don't require shared memory. Also useful for letting the world know that you're ready to stop pretending that GPUs aren't SIMD.	NvShfl NvShflUp NvShflDown NvShflXor	AmdDxExtShaderIntrinsics_ReadfirstlaneF/U AmdDxExtShaderIntrinsics_ReadlaneF/U AmdDxExtShaderIntrinsics_SwizzleF/U	IntelExt_WaveReadLaneFirst IntelExt_WaveReadLaneAt IntelExt_QuadReadAcrossDiagonal IntelExt_QuadReadLaneAt IntelExt_QuadReadAcrossX IntelExt_QuadReadAcrossY	Kepler	Southern Islands	Haswell?	Supported in D3D12 with Shader Model 6.0	http://devblogs.nvidia.com/parallelforall/faster-parallel-reductions-kepler/ https://www.opengl.org/registry/specs/NV/shader_thread_shuffle.txt http://docs.nvidia.com/cuda/cuda-c-programming-guide/#warp-shuffle-functions https://developer.nvidia.com/unlocking-gpu-intrinsics-hlsl https://developer.nvidia.com/reading-between-threads-shader-intrinsics http://gpuopen.com/gcn-shader-extensions-for-direct3d-and-vulkan/ https://github.com/intel/intel-graphics-compiler/blob/master/inc/IntelExtensions.hlsl
30	Lane Voting	Allows usage of ballot/any/all functionality in a shader. Any() will return true if a value is true on any lane of a warp. All() will return true if a value is true on all lanes of a warp. Ballot() will return a bitfield where each bit represents whether or not the specified value was true for every lane of the warp.	NvAny NvAll NvBallot	AmdDxExtShaderIntrinsics_Ballot AmdDxExtShaderIntrinsics_BallotAny AmdDxExtShaderIntrinsics_BallotAll	IntelExt_WaveActiveBallot IntelExt_WaveActiveAllTrue IntelExt_WaveActiveAllEqual IntelExt_WaveAll	Fermi	Southern Islands	Haswell?	Supported in D3D12 with Shader Model 6.0	http://docs.nvidia.com/cuda/cuda-c-programming-guide/#warp-vote-functions https://www.opengl.org/registry/specs/NV/shader_thread_group.txt https://developer.nvidia.com/unlocking-gpu-intrinsics-hlsl https://developer.nvidia.com/reading-between-threads-shader-intrinsics http://gpuopen.com/gcn-shader-extensions-for-direct3d-and-vulkan/ https://github.com/intel/intel-graphics-compiler/blob/master/inc/IntelExtensions.hlsl
31	Lane ID	Returns the current thread's lane ID within a warp/wavefront.	NvGetLaneId	AmdDxExtShaderIntrinsics_LaneId	IntelExt_WaveGetLaneIndex	Kepler	Southern Islands	Haswell?	Supported in D3D12 with Shader Model 6.0	https://developer.nvidia.com/unlocking-gpu-intrinsics-hlsl https://developer.nvidia.com/reading-between-threads-shader-intrinsics http://gpuopen.com/gcn-shader-extensions-for-direct3d-and-vulkan/ https://github.com/intel/intel-graphics-compiler/blob/master/inc/IntelExtensions.hlsl
32	Count Active Lanes	Counts number of active lanes within a warp/wavefront that have an index less than the current lane	N/A	AmdDxExtShaderIntrinsics_MBCnt	N/A	None	Southern Islands	None	Supported in D3D12 with Shader Model 6.0	http://gpuopen.com/gcn-shader-extensions-for-direct3d-and-vulkan/
33	FP32 Atomics	Perform atomic adds on fp32 values in RWByteAddressBuffers or RWTextures.	NvInterlockedAddFp32	N/A	N/A	Kepler	None	None	None.	https://developer.nvidia.com/unlocking-gpu-intrinsics-hlsl
34	FP16 Atomics	Perform atomic add/min/max on a groups of 2 or 4 fp16 values in RWByteAddressBuffers or RWTextures.	NvInterlockedAddFp16x2 NvInterlockedMinFp16x2 NvInterlockedMaxFp16x2 NvInterlockedAddFp16x4 NvInterlockedMinFp16x4 NvInterlockedMaxFp16x4	N/A	N/A	Maxwell 2.0	None	None	None.	http://developer.download.nvidia.com/assets/events/GDC15/GEFORCE/Maxwell_Archictecture_GDC15.pdf https://developer.nvidia.com/sites/default/files/akamai/opengl/specs/GL_NV_shader_atomic_fp16_vector.txt https://developer.nvidia.com/unlocking-gpu-intrinsics-hlsl
35	U64 Atomics	Perform atomic add/min/max/add/or/xor/exchange on a 64-bit unsigned integer in RWByteAddressBuffers or RWTextures.	N/A	AmdDxExtShaderIntrinsics_AtomicOp	N/A	None	Southern Islands	None	None.	https://github.com/GPUOpen-LibrariesAndSDKs/AGS_SDK/blob/master/ags_lib/hlsl/ags_shader_intrinsics_dx11.hlsl
36	UAV Typed Loads	Allows reading from Texture UAVs that have formats other than R32_UINT/R32_SINT/R32_FLOAT, effectively bypassing The Most Annoying Restriction In The History Of Graphics APIs™. Stock D3D11 does allow aliasing most 32-bit formats (such as R8G8B8A8_UNORM) as R32_UINT, allowing for manual packing and unpacking.	NvLoadUavTyped	N/A	N/A	Fermi?	None	None	Natively supported for at least 18 formats in D3D11.3/D3D12 with FL 12_0, optional support for the rest.	https://msdn.microsoft.com/en-us/library/windows/desktop/ff728749(v=vs.85).aspx
37	3-parameter Min/Max/Med	Returns the min, max or median value from a set of 3 parameters.	N/A	AmdDxExtShaderIntrinsics_Min3F/U AmdDxExtShaderIntrinsics_Med3F/U AmdDxExtShaderIntrinsics_Max3F/U	N/A	None	Southern Islands	None	None. (AGS has D3D12 support)	http://gpuopen.com/gcn-shader-extensions-for-direct3d-and-vulkan/
38	Barycentrics and Interpolation	Provides the pixel shader with access to the barycentrics used for interpolating vertex attributes, allowing for programmable interpolation. Also handy for implementing deferred rendering with visibility buffers.	N/A	AmdDxExtShaderIntrinsics_IjBarycentricCoords AmdDxExtShaderIntrinsics_PullModelBarycentricCoords AmdDxExtShaderIntrinsics_VertexParameter AmdDxExtShaderIntrinsics_VertexParameterComponent	N/A	None	Southern Islands	None	Optional support in D3D12 with Shader Model 6.1. (AGS has D3D12 support)	http://gpuopen.com/gcn-shader-extensions-for-direct3d-and-vulkan/ http://gpuopen.com/gaming-product/barycentrics12-dx12-gcnshader-ext-sample/
39	Wave Reduction	Performs an operation on all active lanes of the current wavefront (such as a sum, min, or max) and returns the result. It's simpler and faster than using thread group shared memory to do the same thing!	N/A	AmdDxExtShaderIntrinsics_WaveReduce AmdDxExtShaderIntrinsics_WaveActiveSum AmdDxExtShaderIntrinsics_WaveActiveProduct AmdDxExtShaderIntrinsics_WaveActiveMin AmdDxExtShaderIntrinsics_WaveActiveMax AmdDxExtShaderIntrinsics_WaveActiveBitAnd AmdDxExtShaderIntrinsics_WaveActiveBitOr AmdDxExtShaderIntrinsics_WaveActiveBitXor	IntelExt_WaveActiveBitAnd IntelExt_WaveActiveBitOr IntelExt_WaveActiveCountBits IntelExt_WaveActiveMax IntelExt_WaveActiveMin IntelExt_WaveActiveProduct IntelExt_WaveActiveSum	None	Southern Islands	Haswell?	Native support in D3D12 with Shader Model 6.0.	https://gpuopen.com/amd-gpu-services-5-1-1/ https://github.com/intel/intel-graphics-compiler/blob/master/inc/IntelExtensions.hlsl
40	Wave Scan	Similar to wave reductions, except these perform the operation on all active lanes prior to your own lane. So if you ran a prefix sum on lane 4, it would give you the sum of the value from lanes 0, 1, 2, and 3. A postfix sum would do the same, but would also include your own lane (Nvidia uses the terms "exclusive" and "inclusive" to mean the same thing as prefix/postfix).	NvWaveMultiPrefixInclusiveAdd NvWaveMultiPrefixExclusiveAdd NvWaveMultiPrefixInclusiveAdd NvWaveMultiPrefixExclusiveAdd NvWaveMultiPrefixInclusiveAdd NvWaveMultiPrefixExclusiveAdd NvWaveMultiPrefixInclusiveAnd NvWaveMultiPrefixExclusiveAnd NvWaveMultiPrefixInclusiveAnd NvWaveMultiPrefixExclusiveAnd NvWaveMultiPrefixInclusiveAnd NvWaveMultiPrefixExclusiveAnd NvWaveMultiPrefixInclusiveOr NvWaveMultiPrefixExclusiveOr NvWaveMultiPrefixInclusiveOr NvWaveMultiPrefixExclusiveOr NvWaveMultiPrefixInclusiveOr NvWaveMultiPrefixExclusiveOr NvWaveMultiPrefixInclusiveXOr NvWaveMultiPrefixExclusiveXOr NvWaveMultiPrefixInclusiveXOr NvWaveMultiPrefixExclusiveXOr NvWaveMultiPrefixInclusiveXOr NvWaveMultiPrefixExclusiveXOr	AmdDxExtShaderIntrinsics_WaveScan AmdDxExtShaderIntrinsics_WavePrefixSum AmdDxExtShaderIntrinsics_WavePrefixProduct AmdDxExtShaderIntrinsics_WavePrefixMin AmdDxExtShaderIntrinsics_WavePrefixMax AmdDxExtShaderIntrinsics_WavePostfixSum AmdDxExtShaderIntrinsics_WavePostfixProduct AmdDxExtShaderIntrinsics_WavePostfixMin AmdDxExtShaderIntrinsics_WavePostfixMax	IntelExt_WavePrefixCountBits IntelExt_WavePrefixProduct IntelExt_WavePrefixSum	Kepler?	Southern Islands	Haswell?	Native support in D3D12 with Shader Model 6.0 (no postfix instrinsics, but these can be trivially implemented on your own).	https://gpuopen.com/amd-gpu-services-5-1-1/ https://github.com/intel/intel-graphics-compiler/blob/master/inc/IntelExtensions.hlsl