D3D11 Vendor Hacks
 Share
The version of the browser you are using is no longer supported. Please upgrade to a supported browser.Dismiss

View only
 
ABCDEFGHIJKLMNOPQRSTUVWXYZAAABAC
1
This is a list of D3D11 vendor/driver hacks, inspired by Aras's list of D3D9 GPU Hacks.
Everything here is not natively available in D3D11 using FEATURE_LEVEL_11_0, which is the maximum FL supported by Win7.
2
The "support" columns indicate the minimum GPU on which you can use the listed extension(s) for that column. They do not necessarilly reflect the actual capabilities of the
hardware, only the functionality that's exposed through D3D11 extensions. Note that entries in these columns with a question mark are unconfirmed, and are just my best
guess at the moment. Please let me know you have information about the supported hardware, or can help confirm hardware support for particular feature.
3
4
========================= Rendering Hacks ==============================================
5
6
FeatureDescriptionNVAPI Function(s)AGS Function(s)IGFX Function(s)NV Support AMD SupportIntel SupportD3D11.x/D3D12 SupportReferences
7
UAV OverlapTells the driver to skip synchronization between draw or dispatch calls that use UAVs. Normally the driver will sync after each draw or dispatch that writes to a UAV, in order to prevent hazards when two threads try to access the same area of memory. Using this extension can allow multiple Draws/Dispatches to run in parallel on the GPU, and will also let you keep your QA team busy with difficult-to-repro sync bugs!NvAPI_D3D11_BeginUAVOverlapEx
NvAPI_D3D11_EndUAVOverlap
agsDriverExtensionsDX11_BeginUAVOverlap
agsDriverExtensionsDX11_EndUAVOverlap
N/AFermi?Southern IslandsNoneEquivalent behavior can be obtained by omitting barriers in D3D12
https://gpuopen-librariesandsdks.github.io/ags/group__dx11misc.html#ga16f7cfc4d3c436b211f299341e25c801
https://gpuopen-librariesandsdks.github.io/ags/group__dx11misc.html#gae22fcecf7799dfd5aae4bfd308e6444e
http://docs.nvidia.com/gameworks/content/gameworkslibrary/coresdk/nvapi/group__dx.html#gac3f34cbd997bdb51478ada50255a9dd7
http://docs.nvidia.com/gameworks/content/gameworkslibrary/coresdk/nvapi/group__dx.html#gaeb78a97e256f3c6c511451dded3994e5
http://docs.nvidia.com/gameworks/content/gameworkslibrary/coresdk/nvapi/group__dx.html#gaed6aaa526d5a4729d9524039eae4c825
8
Depth Bounds TestRejects all pixels whose depth falls outside of a range specified by a minimum and maximum depth. Originally developed for accelerating stencil shadows, but can also be used when accumulating deferred lights or projective decals.NvAPI_D3D11_SetDepthBoundsTest agsDriverExtensionsDX11_SetDepthBoundsN/AFermi?Southern IslandsNoneOptional support as of Windows 10 version 1703 (AKA Creator's Update). NVAPI also has a D3D12 extension.
https://gpuopen-librariesandsdks.github.io/ags/group__dx11misc.html#gaf1635db8ecaaefa20b4950a9191fdcb6
http://docs.nvidia.com/gameworks/content/gameworkslibrary/coresdk/nvapi/group__dx.html#ga0502f9d58555b662a3b6fcc9b61b7d2a
9
Forced MSAA
Sample Count
Forces a specified MSAA sample count regardless of the render targets and depth targets bound. Can be used to implement MSAA variants that don't require the full storage and bandwidth cost of MSAA render targets.
NvAPI_D3D11_CreateRasterizerState
NvAPI_D3D11_RASTERIZER_DESC_EX::ForcedSampleCount
N/AN/AFermi?NoneNoneTarget independent rasterization provides equivalent functionality, available in D3D11.1 and D3D12 with FL 11_1
http://developer.download.nvidia.com/assets/events/GDC15/GEFORCE/Maxwell_Archictecture_GDC15.pdf
http://docs.nvidia.com/gameworks/content/gameworkslibrary/coresdk/nvapi/group__dx.html#gaa05c2e42fdf9f7ead12acb291f1b9444
10
Programmable MSAA
Sample Positions
Allows specifying the location of MSAA sample points within a pixel. Can be used to implement interleaved sampling, jittered sampling, or poor man's decoupled shading.
NvAPI_D3D11_CreateRasterizerState
NvAPI_D3D11_RASTERIZER_DESC_EX::ProgrammableSamplePositionsEnable
NvAPI_D3D11_RASTERIZER_DESC_EX::InterleavedSamplingEnable
NvAPI_D3D11_RASTERIZER_DESC_EX::SamplePositionsX
NvAPI_D3D11_RASTERIZER_DESC_EX::SamplePositionsY
N/AN/AMaxwell 2.0NoneNoneOptional support with multiple tiers as of Windows 10 version 1703 (AKA Creator's Update). NVAPI has a D3D12 extension.
https://mynameismjp.wordpress.com/2015/09/13/programmable-sample-points/
http://www.geforce.com/hardware/technology/mfaa/technology
https://developer.nvidia.com/sites/default/files/akamai/opengl/specs/GL_NV_sample_locations.txt
http://developer.download.nvidia.com/assets/events/GDC15/GEFORCE/Maxwell_Archictecture_GDC15.pdf
http://docs.nvidia.com/gameworks/content/gameworkslibrary/coresdk/nvapi/group__dx.html#gaa05c2e42fdf9f7ead12acb291f1b9444
11
Conservative
Rasterization
Causes a pixel to be shaded if any part of the pixel is covered by a primitive, instead of only testing at 1 or more sample points. Useful for voxelization, occlusion culling, analytical antialiasing, or tiled light binning. Note that using this will typically result in vertex attributes being extrapolated past triangle edges, since they will still be interpolated to the pixel center before shading.
NvAPI_D3D11_CreateRasterizerState
NvAPI_D3D11_RASTERIZER_DESC_EX::ConservativeRasterEnable
N/AN/AMaxwell 2.0NoneNoneOptional support with multiple tiers in D3D12 and D3D11.3 with FL 12_1
https://developer.nvidia.com/content/dont-be-conservative-conservative-rasterization
http://developer.download.nvidia.com/assets/events/GDC15/GEFORCE/Maxwell_Archictecture_GDC15.pdf
http://docs.nvidia.com/gameworks/content/gameworkslibrary/coresdk/nvapi/group__dx.html#gaa05c2e42fdf9f7ead12acb291f1b9444
12
Quad FillingCauses all pixels within a triangle's screen-space AABB to be shaded. Can also enable a mode where the entire viewport is shaded (Nvidia only). For AMD vertex attributes are not properly interpolated, so only SV_Position will be valid.
NvAPI_D3D11_CreateRasterizerState
NvAPI_D3D11_RASTERIZER_DESC_EX::QuadFillMode
agsDriverExtensionsDX11_IASetPrimitiveTopologyN/AMaxwell 2.0Southern IslandsNoneNone. (NVAPI has a D3D12 extension)
http://developer.download.nvidia.com/assets/events/GDC15/GEFORCE/Maxwell_Archictecture_GDC15.pdf
https://www.opengl.org/registry/specs/NV/fill_rectangle.txt
http://docs.nvidia.com/gameworks/content/gameworkslibrary/coresdk/nvapi/group__dx.html#gaa05c2e42fdf9f7ead12acb291f1b9444
https://gpuopen-librariesandsdks.github.io/ags/group__dx11misc.html#gaa5367888466f032a79c6869402282c5f
13
Post-Z CoverageCauses SV_Coverage to reflect the active sample points after performing the depth test. Only applicable when SV_Coverage is used as an input to the PS.
NvAPI_D3D11_CreateRasterizerState
NvAPI_D3D11_RASTERIZER_DESC_EX::PostZCoverageEnable
N/AN/AMaxwell 2.0NoneNoneNone. (NVAPI has a D3D12 extension)
http://developer.download.nvidia.com/assets/events/GDC15/GEFORCE/Maxwell_Archictecture_GDC15.pdf
https://developer.nvidia.com/sites/default/files/akamai/opengl/specs/GL_EXT_post_depth_coverage.txt
http://docs.nvidia.com/gameworks/content/gameworkslibrary/coresdk/nvapi/group__dx.html#gaa05c2e42fdf9f7ead12acb291f1b9444
14
Coverage to ColorCauses an SV_Coverage mask to be converted to a [0, 1] floating point value and multiplied with the PS output color.
NvAPI_D3D11_CreateRasterizerState
NvAPI_D3D11_RASTERIZER_DESC_EX::CoverageToColorEnable
NvAPI_D3D11_RASTERIZER_DESC_EX::CoverageToColorRTIndex
N/AN/AMaxwell 2.0NoneNoneNone. (NVAPI has a D3D12 extension)
http://developer.download.nvidia.com/assets/events/GDC15/GEFORCE/Maxwell_Archictecture_GDC15.pdf
https://developer.nvidia.com/sites/default/files/akamai/opengl/specs/GL_NV_fragment_coverage_to_color.txt
http://docs.nvidia.com/gameworks/content/gameworkslibrary/coresdk/nvapi/group__dx.html#gaa05c2e42fdf9f7ead12acb291f1b9444
15
Alias MSAA texture as a
non-MSAA texture
Causes an alias of an MSAA texture that can be viewed as a non-MSAA texture in shaders that read from it. The width and height of the alias is either doubled or quadrupled depending on the MSAA mode. So a 2xMSAA alias will have 2x the width, a 4xMSAA alias will have 2x the width and 2x the height, and an 8xMSAA alias will have 4x the width and 2x the height. Possibly useful for using HW bilinear filtering when performing MSAA resolve.NvAPI_D3D11_AliasMSAATexture2DAsNonMSAAN/AN/AFermi?N/ANoneNone. (NVAPI has a D3D12 extension)
http://docs.nvidia.com/gameworks/content/gameworkslibrary/coresdk/nvapi/group__dx.html#ga4f6364cba8cc3a6cbd45d282b413b03d
16
MultiDrawIndirectLike DrawInstancedIndirect and DrawIndexedInstancedIndirect, except there's an additional parameter for the draw count. The GPU then loops over the draw count, and indexes into a buffer containing args for each draw. May cause your GPU's frontend to have a nervous breakdown. NOTE: Nvidia's version only supports passing a draw count from the CPU, while AMD supports both CPU-side and GPU-side draw counts.
NvAPI_D3D11_MultiDrawInstancedIndirect
NvAPI_D3D11_MultiDrawIndexedInstancedIndirect
agsDriverExtensionsDX11_MultiDrawInstancedIndirect
agsDriverExtensionsDX11_MultiDrawIndexedInstancedIndirect
agsDriverExtensionsDX11_MultiDrawInstancedIndirectCountIndirect
agsDriverExtensionsDX11_MultiDrawIndexedInstancedIndirectCountIndirect
N/AFermi?Southern IslandsNoneD3D12 natively supports ExecuteIndirect, which is a superset of MultiDrawIndirect functionality
https://gpuopen-librariesandsdks.github.io/ags/group__mdi.html#ga61b8abec809f1a11768d7fb9ae34ec1d
https://gpuopen-librariesandsdks.github.io/ags/group__mdi.html#ga32a90d7d4e3b0f5a2fbbb8e2a6d49016
https://gpuopen-librariesandsdks.github.io/ags/group__mdi.html#gab94ccbaabcf176631416e73bdfca99e0
https://gpuopen-librariesandsdks.github.io/ags/group__mdi.html#gac1dbfb2ec7f0918450b5a02de4d058f8
17
Quad List PrimitivesEnables rendering using a list of quads instead of triangles. Pretend that you're developing for the Sega Saturn!N/AagsDriverExtensionsDX11_IASetPrimitiveTopologyN/AN/ASouthern IslandsNoneNonehttps://gpuopen-librariesandsdks.github.io/ags/group__dx11misc.html#gaf1635db8ecaaefa20b4950a9191fdcb6
18
Multi-View RenderingAllows replicating your draw calls to multiple viewports and/or render target array slices. The intended use case is stereoscopic rendering for 3D or VR, which requires drawing and rasterizing your geometry twice in the simplest case. Nvidia's version is called "single-pass stereo", and lets you specify separate post-projection X values for each eye from the vertex/geometry/domain shader. AMD's version lets you specify a viewport mask with optional clipping rectangles, and also lets you access the viewport/RT slice index in the shader.
NvAPI_D3D_SetSinglePassStereoMode
NvAPI_D3D_QuerySinglePassStereoSupport
agsDriverExtensionsDX11_SetViewBroadcastMasks
agsDriverExtensionsDX11_GetMaxClipRects
agsDriverExtensionsDX11_SetClipRects
AmdDxExtShaderIntrinsics_GetViewportIndex
AmdDxExtShaderIntrinsics_GetViewportIndexPsOnly
AmdDxExtShaderIntrinsics_GetRTArraySlice
AmdDxExtShaderIntrinsics_GetRTArraySlicePsOnly
N/APascalSouthern Islands?NoneOptional tiered support as of Windows 10 version 1703 (AKA Creator's Update). The highest tier level includes shader support for SV_ViewID, which also controls which shader stages are replicated per-view.
http://docs.nvidia.com/gameworks/content/gameworkslibrary/coresdk/nvapi/group__dx.html#ga874782dc7d22a7a946164fb2047b504f
http://docs.nvidia.com/gameworks/content/gameworkslibrary/coresdk/nvapi/group__dx.html#gaf55ec1713d3d5a4a9a933f1cd020ad19
https://gpuopen-librariesandsdks.github.io/ags/group__multiview.html#gaa5f9d9b7b45d88824c03ff397036664d
https://developer.nvidia.com/pascal-vr-tech
19
Modified Post-Projection W Lets you specify coefficients that modify the post-projection W component with seperate coefficients per-viewport. The main use case is what Nvidia calls "Lens-Matched Shading", which effectively lets you taper off the rasterizatin/shading resolution towards the edges of a single view, which better matches the non-linear warping that's applied to images before displayed in a VR headset.NvAPI_D3D_SetModifiedWMode
NvAPI_D3D_QueryModifiedWSupport
N/AN/APascalN/ANoneNone. (NVAPI has a D3D12 extension)
http://docs.nvidia.com/gameworks/content/gameworkslibrary/coresdk/nvapi/group__dx.html#gab45e2704ef90ef7b1276761862e05c73
http://docs.nvidia.com/gameworks/content/gameworkslibrary/coresdk/nvapi/group__dx.html#gaa71a96da9ed91f3bcbeb156df601bbd3
https://developer.nvidia.com/pascal-vr-tech
Also see the MultiProjection sample in Nvidia's VRWorks SDK
20
Late LatchingNormally constant buffers need to be updated by the CPU before issuing draw/dispatch calls, which means in practice the update happens well before the GPU actually executes the draw/dispatch. Late latching lets the CPU update the buffer just before the GPU reads from it, hence the "late" part of the name. The primary use case is for for VR, where you want to query the headet's pose and update the camera matrices as late as possible in order to reduce motion-to-photon latency.NvAPI_D3D_CreateLateLatchObject
NvAPI_D3D_QueryLateLatchSupport
N/AN/AMaxwell 2.0?N/ANoneD3D12 by nature gives you significantly more control over resource updating and CPU/GPU synchronization. Threre's also the ID3D12GraphicsCommandList1::AtomicCopyBufferUINT function that's available as of Windows 10 version 1703, which lets you atomically update and and copy as a single step.
http://docs.nvidia.com/gameworks/content/gameworkslibrary/coresdk/nvapi/group__dx.html#gaf9fde46181b681155c375584e488d17f
http://docs.nvidia.com/gameworks/content/gameworkslibrary/coresdk/nvapi/group__dx.html#ga34b758d8ad67c7e870d9ab14bfd78a90
Also see the LateLatch sample in Nvidia's VRWorks SDK
21
Driver Shader Compiler
Control
Drivers need to JIT compile DXBC bytecode into the native ISA of the GPU before it can run a shader program. Drivers will often handle this by spawning background threads to compile the shaders asynchronously when the D3D shader object is created, and will possibly have to sync on those threads when a draw/dispatch is issued that uses the shader (which can cause hitching). These background threads can compete with the game's threads, and if a game is already creating the shaders as part of an async loading thread then you may be better off telling the driver not to spawn its own tasks.N/A
agsDriverExtensionsDX11_SetMaxAsyncCompileThreadCount
agsDriverExtensionsDX11_NumPendingAsyncCompileJobs
agsDriverExtensionsDX11_SetDiskShaderCacheEnabled
N/AN/ASouthern IslandsNoneNone (D3D12 already forbids drivers from creating background threads to compile shaders, all compilation happens at PSO creation time)https://gpuopen-librariesandsdks.github.io/ags/group__shadercompiler.html
22
23
========================= Shader Hacks ==============================================
24
25
FeatureDescriptionNVAPI Function(s)AGS Function(s)IGFX Function(s)NV Support AMD SupportIntel SupportD3D11.x/D3D12 SupportReferences
26
PixelSyncEnforces ordered access for UAV r/w operations based on primitive submission order. Useful for OIT, volumetric shadows, programmable blending, voxelization, and solving world hunger. N/AN/A
IntelExt_BeginPixelShaderOrdering
IntelExt_BeginPixelShaderOrderingOnUAV
NoneNoneHaswellRasterizer Ordered Views provide equivalent functionality in D3D11.3 and D3D12 with FL 12_1
http://advances.realtimerendering.com/s2013/2013-07-23-SIGGRAPH-PixelSync.pdf
https://software.intel.com/en-us/articles/programmable-blend-with-pixel-shader-ordering
https://software.intel.com/en-us/blogs/2013/07/18/order-independent-transparency-approximation-with-pixel-synchronization
https://software.intel.com/en-us/blogs/2013/03/27/adaptive-volumetric-shadow-maps
27
"Fast" Geometry ShaderAllows creating a "pass-through" GS that can output a mask indicating which viewports a triangle should be rasterized to. Useful for multi-resolution VR, cubemap rendering, and voxelization. Graphics programmers remain skeptical after 9 years of being let down by geometry shaders.
NvAPI_D3D11_CreateFastGeometryShader
NvAPI_D3D11_CreateFastGeometryShaderExplicit
N/AN/AMaxwell 2.0NoneNoneNone. (NVAPI has a D3D12 extension)
http://developer.download.nvidia.com/assets/events/GDC15/GEFORCE/Maxwell_Archictecture_GDC15.pdf
https://developer.nvidia.com/virtual-reality-development
https://developer.nvidia.com/sites/default/files/akamai/gameworks/vr/GameWorks_VR_2015_Final_handouts.pdf
https://developer.nvidia.com/sites/default/files/akamai/opengl/specs/GL_NV_geometry_shader_passthrough.txt
https://www.opengl.org/registry/specs/NV/viewport_array2.txt
Also, see Nvidia's Multi-ResVR sample included in the Gameworks VR SDK
28
Lane ShufflesPerforms a SIMD shuffle between the lanes of a warp, or a subgroup inside of a warp. Can broadcast the value from one lane to all lanes, shuffle based on a delta, or shuffle based on an XOR of the lane ID. Useful for fast reductions that don't require shared memory. Also useful for letting the world know that you're ready to stop pretending that GPUs aren't SIMD.NvShfl
NvShflUp
NvShflDown
NvShflXor
AmdDxExtShaderIntrinsics_ReadfirstlaneF/U
AmdDxExtShaderIntrinsics_ReadlaneF/U
AmdDxExtShaderIntrinsics_SwizzleF/U
IntelExt_WaveReadLaneFirst
IntelExt_WaveReadLaneAt
IntelExt_QuadReadAcrossDiagonal
IntelExt_QuadReadLaneAt
IntelExt_QuadReadAcrossX
IntelExt_QuadReadAcrossY
KeplerSouthern IslandsHaswell?Supported in D3D12 with Shader Model 6.0
http://devblogs.nvidia.com/parallelforall/faster-parallel-reductions-kepler/
https://www.opengl.org/registry/specs/NV/shader_thread_shuffle.txt
http://docs.nvidia.com/cuda/cuda-c-programming-guide/#warp-shuffle-functions
https://developer.nvidia.com/unlocking-gpu-intrinsics-hlsl
https://developer.nvidia.com/reading-between-threads-shader-intrinsics
http://gpuopen.com/gcn-shader-extensions-for-direct3d-and-vulkan/
https://github.com/intel/intel-graphics-compiler/blob/master/inc/IntelExtensions.hlsl
29
Lane VotingAllows usage of ballot/any/all functionality in a shader. Any() will return true if a value is true on any lane of a warp. All() will return true if a value is true on all lanes of a warp. Ballot() will return a bitfield where each bit represents whether or not the specified value was true for every lane of the warp.NvAny
NvAll
NvBallot
AmdDxExtShaderIntrinsics_Ballot
AmdDxExtShaderIntrinsics_BallotAny
AmdDxExtShaderIntrinsics_BallotAll
IntelExt_WaveActiveBallot
IntelExt_WaveActiveAllTrue
IntelExt_WaveActiveAllEqual
IntelExt_WaveAll
FermiSouthern IslandsHaswell?Supported in D3D12 with Shader Model 6.0
http://docs.nvidia.com/cuda/cuda-c-programming-guide/#warp-vote-functions
https://www.opengl.org/registry/specs/NV/shader_thread_group.txt
https://developer.nvidia.com/unlocking-gpu-intrinsics-hlsl
https://developer.nvidia.com/reading-between-threads-shader-intrinsics
http://gpuopen.com/gcn-shader-extensions-for-direct3d-and-vulkan/
https://github.com/intel/intel-graphics-compiler/blob/master/inc/IntelExtensions.hlsl
30
Lane IDReturns the current thread's lane ID within a warp/wavefront.NvGetLaneIdAmdDxExtShaderIntrinsics_LaneIdIntelExt_WaveGetLaneIndexKeplerSouthern IslandsHaswell?Supported in D3D12 with Shader Model 6.0
https://developer.nvidia.com/unlocking-gpu-intrinsics-hlsl
https://developer.nvidia.com/reading-between-threads-shader-intrinsics
http://gpuopen.com/gcn-shader-extensions-for-direct3d-and-vulkan/
https://github.com/intel/intel-graphics-compiler/blob/master/inc/IntelExtensions.hlsl
31
Count Active LanesCounts number of active lanes within a warp/wavefront that have an index less than the current laneN/AAmdDxExtShaderIntrinsics_MBCntN/ANoneSouthern IslandsNoneSupported in D3D12 with Shader Model 6.0http://gpuopen.com/gcn-shader-extensions-for-direct3d-and-vulkan/
32
FP32 AtomicsPerform atomic adds on fp32 values in RWByteAddressBuffers or RWTextures.NvInterlockedAddFp32
N/AN/AKeplerNoneNoneNone. https://developer.nvidia.com/unlocking-gpu-intrinsics-hlsl
33
FP16 AtomicsPerform atomic add/min/max on a groups of 2 or 4 fp16 values in RWByteAddressBuffers or RWTextures.
NvInterlockedAddFp16x2
NvInterlockedMinFp16x2
NvInterlockedMaxFp16x2
NvInterlockedAddFp16x4
NvInterlockedMinFp16x4
NvInterlockedMaxFp16x4
N/AN/AMaxwell 2.0NoneNoneNone.
http://developer.download.nvidia.com/assets/events/GDC15/GEFORCE/Maxwell_Archictecture_GDC15.pdf
https://developer.nvidia.com/sites/default/files/akamai/opengl/specs/GL_NV_shader_atomic_fp16_vector.txt
https://developer.nvidia.com/unlocking-gpu-intrinsics-hlsl
34
UAV Typed LoadsAllows reading from Texture UAVs that have formats other than R32_UINT/R32_SINT/R32_FLOAT*, effectively bypassing The Most Annoying Restriction In The History Of Graphics APIs™.

*Stock D3D11 does allow aliasing most 32-bit formats (such as R8G8B8A8_UNORM) as R32_UINT, allowing for manual packing and unpacking.
NvLoadUavTypedN/AN/AFermi?NoneNoneNatively supported for at least 18 formats in D3D11.3/D3D12 with FL 12_0, optional support for the rest.https://msdn.microsoft.com/en-us/library/windows/desktop/ff728749(v=vs.85).aspx
35
3-parameter Min/Max/MedReturns the min, max or median value from a set of 3 parameters.N/A
AmdDxExtShaderIntrinsics_Min3F/U
AmdDxExtShaderIntrinsics_Med3F/U
AmdDxExtShaderIntrinsics_Max3F/U
N/ANoneSouthern IslandsNoneNone. (AGS has D3D12 support)http://gpuopen.com/gcn-shader-extensions-for-direct3d-and-vulkan/
36
Barycentrics and InterpolationProvides the pixel shader with access to the barycentrics used for interpolating vertex attributes, allowing for programmable interpolation. Also handy for implementing deferred rendering with visibility buffers.N/AAmdDxExtShaderIntrinsics_IjBarycentricCoords
AmdDxExtShaderIntrinsics_PullModelBarycentricCoords
AmdDxExtShaderIntrinsics_VertexParameter
AmdDxExtShaderIntrinsics_VertexParameterComponent
N/ANoneSouthern IslandsNoneOptional support in D3D12 with Shader Model 6.1. (AGS has D3D12 support)
http://gpuopen.com/gcn-shader-extensions-for-direct3d-and-vulkan/
http://gpuopen.com/gaming-product/barycentrics12-dx12-gcnshader-ext-sample/
37
Wave ReductionPerforms an operation on all active lanes of the current wavefront (such as a sum, min, or max) and returns the result. It's simpler and faster than using thread group shared memory to do the same thing! N/AAmdDxExtShaderIntrinsics_WaveReduce
AmdDxExtShaderIntrinsics_WaveActiveSum
AmdDxExtShaderIntrinsics_WaveActiveProduct
AmdDxExtShaderIntrinsics_WaveActiveMin
AmdDxExtShaderIntrinsics_WaveActiveMax
AmdDxExtShaderIntrinsics_WaveActiveBitAnd
AmdDxExtShaderIntrinsics_WaveActiveBitOr
AmdDxExtShaderIntrinsics_WaveActiveBitXor
IntelExt_WaveActiveBitAnd
IntelExt_WaveActiveBitOr
IntelExt_WaveActiveCountBits
IntelExt_WaveActiveMax
IntelExt_WaveActiveMin
IntelExt_WaveActiveProduct
IntelExt_WaveActiveSum
NoneSouthern IslandsHaswell?Native support in D3D12 with Shader Model 6.0.https://gpuopen.com/amd-gpu-services-5-1-1/
https://github.com/intel/intel-graphics-compiler/blob/master/inc/IntelExtensions.hlsl
38
Wave ScanSimilar to wave reductions, except these perform the operation on all active lanes prior to your own lane. So if you ran a prefix sum on lane 4, it would give you the sum of the value from lanes 0, 1, 2, and 3. A postfix sum would do the same, but would also include your own lane.N/AAmdDxExtShaderIntrinsics_WaveScan
AmdDxExtShaderIntrinsics_WavePrefixSum
AmdDxExtShaderIntrinsics_WavePrefixProduct
AmdDxExtShaderIntrinsics_WavePrefixMin
AmdDxExtShaderIntrinsics_WavePrefixMax
AmdDxExtShaderIntrinsics_WavePostfixSum
AmdDxExtShaderIntrinsics_WavePostfixProduct
AmdDxExtShaderIntrinsics_WavePostfixMin
AmdDxExtShaderIntrinsics_WavePostfixMax
IntelExt_WavePrefixCountBits
IntelExt_WavePrefixProduct
IntelExt_WavePrefixSum
NoneSouthern IslandsHaswell?Native support in D3D12 with Shader Model 6.0 (no postfix instrinsics, but these can be trivially implemented on your own).https://gpuopen.com/amd-gpu-services-5-1-1/
https://github.com/intel/intel-graphics-compiler/blob/master/inc/IntelExtensions.hlsl
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
Loading...