第 1 張,共 2 張

Zhengzhong Tu1 Hossein Talebi2 Han Zhang2 Feng Yang2 Peyman Milanfar2 Alan Bovik1 Yinxiao Li2 1 University of Texas at Austin 2 Google Research

Section 1 (sizes):

    • Posters boards are 48” tall and 96” wide, but we recommend you leave a little border since you may not be able to pin at the vertical edge. Since PowerPoint does not let one define such a large paper size, this template is designed to be printed at 200%, yielding a 46” x94” poster. You can scale it up or down a bit (e.g. 42” is a common paper size at FexEd). Note there is no direct international A0.. A1 equivalent. The poster size is approximately three A0 boards next to each other, i.e., each column in this example is about one A0 board.

    • Ideally you want to keep it very readable: this is not your paper, it is a poster. 32pt here (64 final printing) is good for most text:
      • Sub-bullets are 28 here (56 final)
        • Don’t use smaller than 24pt in this template (which is 48pt in final printing at 200%)
        • Insert plenty of graphics and any math you need

    • When inserting graphics or equations, keep the resolution high (remember this will be printed at 200%). If you can see blocking artifacts at 400% magnification in PowerPoint, consider finding better graphics. This is an example of BAD/LOW RES GRAPHICS

    • Leave enough margin for pushpin and remember many big plotters cannot get within .5” of the actual paper edge.

    • You are free to use colored backgrounds and such but they generally reduce readability.

    • You are free to use what ever fonts you like.
      • San Serif fonts like Arial are more readable from a distance,
      • Serif fonts like times may look more consistent with your mathematics

Core Module 1: Multi-Axis Gated MLP Nlock

  • Contains local branch (2nd Axis) and global branch (1st Axis)
  • Apply gMLP on one axis each time in either one branch
  • Global and ‘fully-convolutional’ with linear complexity
  • Standalone module that can be plugged-in in many networks

Numerical results

Evaluated on 5 low-level tasks, SoTA on 15 out of 20 datasets

Problem Statement

Develop efficient Transformer/MLPs for low-level vision

  • low-level vision like denoising, deblurring, dehazing, etc. requires high-resolution image-to-image processing
  • Vision Transformers/MLPs are promising on high-level tasks, but it is non-trivial to adapt them to low-level (image processing) problems
  • The model needs to be ‘fully-convolutional’, i.e., train on small patches and inference on full resolution. Otherwise, the model will cause patch-boundary artifacts [R1]:�

[R1] Pre-Trained Image Processing Transformer,

CVPR 21, arxiv.org/abs/2012.00364

Our Method: MAXIM Architecture

Our proposed MAXIM model is:

  • A global UNet-like architecture, with multi-stage stacks
  • Every block enjoys global-local spatial interaction
  • ‘Fully-convolutional’, i.e., can be trained on small patches and directly applied on any high resolution (w/o causing patch-boundary effects)
  • scales linearly w.r.t. input image size, unlike other MLP models

Core Module 2: Cross-Gating MLP Block

Visual results → check more @arxiv.org/abs/2201.02973

Summary/Conclusion

    • Summarize your contributions

    • Summarize your results (if applicable)

    • You an add in links to additional videos, code, or project website (or QR code)

References

MAXIM: Multi-Axis MLP for Image Processing

  • Same design to core module 1, but extending to interact two features
  • G(.) function obtains multi-axis gating signals only, and gating is applied reciprocally with two features:�
  • Can be used as a conditioning layer or fusion module
  • Also global and ‘fully-convolutional’ with linear complexity

Mixer/gMLP

Swin-Mixer

MAXIM (ours)

📜Paper

🌟Code

Tweet

Zhihu

Check out

Web demo!

第 2 張,共 2 張