第 1 張，共 2 張

�Zhengzhong Tu¹ Hossein Talebi² Han Zhang² Feng Yang² Peyman Milanfar² Alan Bovik¹ Yinxiao Li² �¹University of Texas at Austin ²Google Research

Section 1 (sizes):

Posters boards are 48” tall and 96” wide, but we recommend you leave a little border since you may not be able to pin at the vertical edge. Since PowerPoint does not let one define such a large paper size, this template is designed to be printed at 200%, yielding a 46” x94” poster. You can scale it up or down a bit (e.g. 42” is a common paper size at FexEd). Note there is no direct international A0.. A1 equivalent. The poster size is approximately three A0 boards next to each other, i.e., each column in this example is about one A0 board.

Ideally you want to keep it very readable: this is not your paper, it is a poster. 32pt here (64 final printing) is good for most text:

Don’t use smaller than 24pt in this template (which is 48pt in final printing at 200%)
Insert plenty of graphics and any math you need

When inserting graphics or equations, keep the resolution high (remember this will be printed at 200%). If you can see blocking artifacts at 400% magnification in PowerPoint, consider finding better graphics. This is an example of BAD/LOW RES GRAPHICS

Leave enough margin for pushpin and remember many big plotters cannot get within .5” of the actual paper edge.

You are free to use colored backgrounds and such but they generally reduce readability.

Core Module 1: Multi-Axis Gated MLP Nlock

Numerical results

Evaluated on 5 low-level tasks, SoTA on 15 out of 20 datasets

Problem Statement

Develop efficient Transformer/MLPs for low-level vision

low-level vision like denoising, deblurring, dehazing, etc. requires high-resolution image-to-image processing
Vision Transformers/MLPs are promising on high-level tasks, but it is non-trivial to adapt them to low-level (image processing) problems
The model needs to be ‘fully-convolutional’, i.e., train on small patches and inference on full resolution. Otherwise, the model will cause patch-boundary artifacts [R1]:�

[R1] Pre-Trained Image Processing Transformer,

Our Method: MAXIM Architecture

Our proposed MAXIM model is:

A global UNet-like architecture, with multi-stage stacks
Every block enjoys global-local spatial interaction
‘Fully-convolutional’, i.e., can be trained on small patches and directly applied on any high resolution (w/o causing patch-boundary effects)
scales linearly w.r.t. input image size, unlike other MLP models

Core Module 2: Cross-Gating MLP Block

Visual results → check more @arxiv.org/abs/2201.02973

Summary/Conclusion

References

MAXIM: Multi-Axis MLP for Image Processing

Same design to core module 1, but extending to interact two features
G(.) function obtains multi-axis gating signals only, and gating is applied reciprocally with two features:�
Can be used as a conditioning layer or fusion module
Also global and ‘fully-convolutional’ with linear complexity

Mixer/gMLP

Swin-Mixer

MAXIM (ours)

📜Paper

🌟Code

Zhihu

Check out

Web demo!

第 2 張，共 2 張