HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

MAXIM: Multi-Axis MLP for Image Processing

Zhengzhong Tu; Hossein Talebi; Han Zhang; Feng Yang; Peyman Milanfar; Alan Bovik; Yinxiao Li

MAXIM: Multi-Axis MLP for Image Processing

Abstract

Recent progress on Transformers and multi-layer perceptron (MLP) models provide new network architectural designs for computer vision tasks. Although these models proved to be effective in many vision tasks such as image recognition, there remain challenges in adapting them for low-level vision. The inflexibility to support high-resolution images and limitations of local attention are perhaps the main bottlenecks. In this work, we present a multi-axis MLP based architecture called MAXIM, that can serve as an efficient and flexible general-purpose vision backbone for image processing tasks. MAXIM uses a UNet-shaped hierarchical structure and supports long-range interactions enabled by spatially-gated MLPs. Specifically, MAXIM contains two MLP-based building blocks: a multi-axis gated MLP that allows for efficient and scalable spatial mixing of local and global visual cues, and a cross-gating block, an alternative to cross-attention, which accounts for cross-feature conditioning. Both these modules are exclusively based on MLPs, but also benefit from being both global and `fully-convolutional', two properties that are desirable for image processing. Our extensive experimental results show that the proposed MAXIM model achieves state-of-the-art performance on more than ten benchmarks across a range of image processing tasks, including denoising, deblurring, deraining, dehazing, and enhancement while requiring fewer or comparable numbers of parameters and FLOPs than competitive models. The source code and trained models will be available at \url{https://github.com/google-research/maxim}.

Code Repositories

google-research/maxim
Official
jax
Mentioned in GitHub
vztu/maxim-pytorch
pytorch
Mentioned in GitHub
sayakpaul/maxim-tf
tf
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
deblurring-on-basedMAXIM (REDS)
ERQAv2.0: 0.74277
LPIPS: 0.07836
SSIM: 0.94959
Subjective: 1.0081
VMAF: 67.3502
deblurring-on-basedMAXIM (GoPro)
LPIPS: 0.09188
PSNR: 31.36344
SSIM: 0.94386
Subjective: 0.2070
VMAF: 67.7557
deblurring-on-based-1MAXIM (REDS)
PSNR: 30.65728
deblurring-on-goproMAXIM-3S
PSNR: 32.86
deblurring-on-hideMAXIM-3S
PSNR: 32.83
deblurring-on-hide-trained-on-goproMAXIM
PSNR (sRGB): 32.83
Params (M): 22.2
SSIM (sRGB): 0.956
deblurring-on-realblur-j-1MAXIM
PSNR (sRGB): 32.84
Params(M): 22.2
SSIM (sRGB): 0.935
deblurring-on-realblur-j-trained-on-goproMAXIM
PSNR (sRGB): 28.83
SSIM (sRGB): 0.875
deblurring-on-realblur-rMAXIM
PSNR (sRGB): 39.45
deblurring-on-realblur-rMAXIM-3S
SSIM (sRGB): 0.961
deblurring-on-realblur-r-trained-on-goproMAXIM
PSNR (sRGB): 35.78
image-deblurring-on-goproMAXIM-3S
PSNR: 32.86
image-deblurring-on-hideMAXIM-3S
SSIM: 0.956
image-dehazing-on-sots-indoorMAXIM-2S
PSNR: 38.11
image-dehazing-on-sots-outdoorMAXIM-2S
PSNR: 34.19
image-denoising-on-dndMAXIM-3S
PSNR (sRGB): 39.84
SSIM (sRGB): 0.954
image-denoising-on-siddMAXIM-3S
PSNR (sRGB): 39.96
SSIM (sRGB): 0.960
low-light-image-enhancement-on-lolMAXIM
Average PSNR: 23.43
SSIM: 0.863
photo-retouching-on-mit-adobe-5kMAXIM
PSNR: 26.15
SSIM: 0.945
single-image-deraining-on-rain100hMAXIM
SSIM: 0.903
single-image-deraining-on-rain100lMAXIM
SSIM: 0.977
single-image-deraining-on-test100MAXIM
PSNR: 31.17
SSIM: 0.922
single-image-deraining-on-test1200MAXIM
SSIM: 0.922
single-image-deraining-on-test2800MAXIM
PSNR: 33.80

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp