FSR 4
FSR 4 Performance Mode with Frame Generation Shown in Call of Duty Black Ops 6 (20:41)
Yesterday, in AMD’s RX 9070 XT launch presentation, they briefly discussed the technological specs of FSR 4 and the nature of its machine learning model.
Question: “Is FSR 4 a transformer model like the new DLSS 4?”
Answer:
Yes and no - it is highly likely to be a novel “transformer lite” type of model that is influenced by the design of transformer models, but retains a convolutional neural network (CNN) backbone similar to DLSS 2, XeSS, and PSSR.
Quotes from AMD’s announcement:
1. “Our new technology leverages a proprietary hybrid model resulting from extensive research across different types and combinations of neural networks and unique training techniques.” (19:30)
2. “... and optimized it [FSR 4] for the new FP8 ML acceleration in RDNA 4.” (19:25)
3. “FSR 4 uses the new FP8 data type in the RDNA 4 architecture to balance quality and performance.” (19:39)
Evidence:
Recent Research Papers:
“A Simple Transformer-style Network for Lightweight Image Super-resolution”
(2022)
“...recently developed methods are computationally expensive and need much more memory. To solve this issue, we propose a simple Transformer-style network (STSN) for the image super resolution (SR) task. The idea of this method is based on using convolutional modulation (Conv2Former), which is a very simple block with a linearly compared to quadratically as in Transformers.” (yes I know it sounds like it’s missing a word but it’s not)
Analysis:
CNNs. The result is a model that blends Transformer concepts.
“Incorporating Transformer Designs into Convolutions for Lightweight Image Super-Resolution”
(2023)
“...we propose a neighborhood attention (NA) module that upgrades the standard convolution with a self-attention mechanism. The NA module efficiently extracts long-range dependencies in a sliding window pattern, thereby achieving similar performance to large convolutional kernels but with fewer parameters.”
Analysis:
“Single-image super-resolution using lightweight transformer-convolutional neural network hybrid model”
(2022)
“These CNN-based methods cannot fully use the internal and external information of the image. The authors add a lightweight Transformer structure to capture this information.”
“The lightweight transformer block (LTB) further extracts features and learns the texture details between the patches through the self-attention mechanism.”
“In the LTB, we stack EMT blocks to capture long-term similarity information in feature maps.”
“We also compare the trade-off between the performance and the number of network parameters from our work and the existing methods. Figure 7 shows the PSNR performances of 10 methods versus the number of parameters, where the results are evaluated with the Set5 dataset for × 4 upscaling factor. We can find that our method significantly outperforms the relatively small models across this dataset and scale. Moreover, our method performs better than EDSR [28] and RDN [37] for × 4 upscaling factor, but with about 90% and 80% fewer parameters on average, respectively. Furthermore, compared with RCAN [29] on four upscaling factors, our model has fewer parameters and achieves higher PSNR. These comparisons indicate that our proposed network has a better trade-off between performance and model size.”
“Furthermore, in terms of FLOPs, our model is more economical than CNLN [31], EDSR, and image super-resolution via deep recursive residual network (DRRN) [58], and its performance is superior to these three methods. Although MemNet [59] and VDSR use fewer FLOPs, our approach obtains better performance and executes faster.”
Analysis:
Overall:
FSR 4 on RDNA 3
(“FSR 4 Lite”)
Possible/probable architecture outlined in:
“Single-image super-resolution using lightweight transformer-convolutional neural network hybrid model” (cont)
““Hence, we devise a simple channel attention mechanism to effectively capture the texture and details of high dimensional features in the HR space, thereby constructing the DAB.”
“ …the DAB is indispensable for producing SR [super resolution] images with highly detailed visual features. This is because the LR [Low Resolution] space contains limited information, and DAB [Detail Attention Block] can compensate for the missing critical local information by extracting the corresponding features in the HR [High Resolution] space.
Also highlighted in a different paper:
“A Simple Transformer-style Network for Lightweight Image Super-resolution” (Cont)
“In this task, all the contents of the conv2Former block are removed, except of the 3 × 3 to indicate the impact of the attention module, as indicated in Fig. 3b (Model 2). The obtained results are indicated in Table 2, where 1st row represents the results of using the Conv2Former, and 3rd row represents the results without using the attention module. The results show that the attention module has a big impact on performance. For instance, the PSNR dropped from 33.77 dB to 33.61 dB on the Set14 dataset. So, these results show that the attention module can greatly impacts the performance.” (yes the grammar error is in the paper)
Moving on to another new technology shown by AMD in their RX 9070 XT launch event,
Neural Supersampling and Denoising
Some background first:
In the era of the GTX 1080 Ti (A.D. 2017), before the RTX series of GPUs, Nvidia released a paper about denoising path traced images using traditional hand coded algorithms (not machine learning) called:
“Spatiotemporal variance-guided filtering: real-time reconstruction for path-traced global illumination”
“We introduce a reconstruction algorithm that generates a temporally stable sequence of images from one path-per-pixel global illumination. To handle such noisy input, we use temporal accumulation to increase the effective sample count and spatiotemporal luminance variance estimates to drive a hierarchical, image-space wavelet filter. This hierarchy allows us to distinguish between noise and detail at multiple scales using local luminance variance.” - their alternative to a machine learning/Ai model
Even before Nvidia announced ray reconstruction, which is their machine learning based denoiser integrated into DLSS, Intel released a research paper showcasing a technology very very similar to DLSS ray reconstruction. Not a lot of people seem to know/talk about this. This is that paper:
“Temporally Stable Real-Time Joint Neural Denoising and
Supersampling”
(Left: Nvidia's SVGF paper shown previously, Middle: Unreal Engine 4 Default, Right: Intel Upscaling-Denoising Tech)
“Recent advances in ray tracing hardware bring real-time path tracing into reach, and ray traced soft shadows, glossy reflections, and diffuse global illumination are now common features in games. Nonetheless, ray budgets are still limited. This results in undersampling, which manifests as aliasing and noise. Prior work addresses these issues separately. While temporal supersampling methods based on neural networks have gained a wide use in modern games due to their better robustness, neural denoising remains challenging because of its higher computational cost.”
“SVGF generally blurs the image too strongly. Fine details in the normal or roughness textures are blurred out. Nonetheless, there is residual low-frequent noise with a splotchy appearance. SVGF also struggles with specular signal components, since temporal accumulation with standard motion vectors leads to temporal lag under camera motion. In spite of the lower-resolution input, our method produces sharper results almost everywhere.”
(compares standard XeSS upscaling with joint denoising and super sampling technique)
Moving on to the present day,
“Neural Supersampling and Denoising for Real-time Path Tracing”
“The randomness of samples in Monte Carlo integration inherently produces noise when the scattered rays do not hit the light source after multiple bounces. Hence, many samples per pixel (spp) are required to achieve high quality pixels in Monte Carlo path tracing, often taking a couple of minutes or hours to render a single image. Although the higher number of samples per pixel, the higher chance of less noise in an image, in many cases even with several thousands of samples it still falls short to converge to high quality and shows visually annoying noise.”
“Neural denoisers use a deep neural network to predict denoising filter weights in a process of training on a large dataset. They are achieving remarkable progress in denoising quality compared to hand-crafted analytical denoising filters [2]. Depending on the complexity of a neural network and how it cooperates with other optimization techniques, neural denoisers are getting more attention to be used for real-time Monte Carlo path tracing.”
*[2]. AMD specifically references Nvidia’s SVGF paper as an example of an inferior technique in the “references” section of their blog haha xD
(not really a jab because it’s a seven year old paper, which is a long time in computer graphics research and especially Ai research since then) I just thought it was interesting.
*They also reference Intel’s paper in the references section
Demoed in AMD’s RX 9070 XT reveal event in the “Toyshop” demo
Some Citations:
Yuanyuan Liu, Mengtao Yue, Han Yan, Lu Zhu: Single-image super-resolution using lightweight transformer-convolutional neural network hybrid model. (2023)
https://doi.org/10.1049/ipr2.12833
Gang Wu, Junjun Jiang, Yuanchao Bai, Xianming Liu: Incorporating Transformer Designs into Convolutions for Lightweight Image Super-Resolution. (2023)
https://doi.org/10.48550/arXiv.2303.14324
Garas Gendy, Nabil Sabor, Jingchao Hou, Guanghui He: A Simple Transformer-style Network for Lightweight Image Super-resolution. (2023)
https://doi.org/10.1109/CVPRW59228.2023.00153
Neural Supersampling and Denoising for Real-time Path Tracing. (2024)
https://gpuopen.com/learn/neural_supersampling_and_denoising_for_real-time_path_tracing/
Christoph Schied, Anton Kaplanya, Kris Wyman, Anjul Patney, Chakravarty R. Alla Chaitanya, John Burgess, Shiqiu Liu, Carsten Dachsbacher, Aaron Lefohn, Marco Salvi: Spatiotemporal Variance-Guided Filtering: Real-Time Reconstruction for Path-Traced Global Illumination. (2017)
Manu Mathew Thomas, Gabor Liktor, Christoph Peters, SungYe Kim, Karthik Vaidyanathan, Angus G. Forbes: Temporally Stable Real-Time Joint Neural Denoising and Supersampling. (2022)
All You Need for Gaming – AMD RDNA™ 4 and RX 9000 Series Reveal
Toyshop Realtime Path Tracing Neural Rendering Tech Demo - YouTube
Osvaldo Pinali Doederlein’s Twitter Post