HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Vision Transformers for Dense Prediction

René Ranftl Alexey Bochkovskiy Vladlen Koltun

Vision Transformers for Dense Prediction

Abstract

We introduce dense vision transformers, an architecture that leverages vision transformers in place of convolutional networks as a backbone for dense prediction tasks. We assemble tokens from various stages of the vision transformer into image-like representations at various resolutions and progressively combine them into full-resolution predictions using a convolutional decoder. The transformer backbone processes representations at a constant and relatively high resolution and has a global receptive field at every stage. These properties allow the dense vision transformer to provide finer-grained and more globally coherent predictions when compared to fully-convolutional networks. Our experiments show that this architecture yields substantial improvements on dense prediction tasks, especially when a large amount of training data is available. For monocular depth estimation, we observe an improvement of up to 28% in relative performance when compared to a state-of-the-art fully-convolutional network. When applied to semantic segmentation, dense vision transformers set a new state of the art on ADE20K with 49.02% mIoU. We further show that the architecture can be fine-tuned on smaller datasets such as NYUv2, KITTI, and Pascal Context where it also sets the new state of the art. Our models are available at https://github.com/intel-isl/DPT.

Code Repositories

antocad/FocusOnDepth
pytorch
Mentioned in GitHub
isl-org/MiDaS
pytorch
Mentioned in GitHub
alexeyab/midas
pytorch
Mentioned in GitHub
vishal-kataria/MiDaS-master
pytorch
Mentioned in GitHub
EPFL-VILAB/3DCommonCorruptions
pytorch
Mentioned in GitHub
huggingface/transformers
pytorch
Mentioned in GitHub
chriswxho/dynamic-inference
pytorch
Mentioned in GitHub
SforAiDl/vformer
pytorch
Mentioned in GitHub
intel-isl/MiDaS
pytorch
Mentioned in GitHub
ahmedmostafa0x61/Depth_Estimation
pytorch
Mentioned in GitHub
danielzgsilva/MonoDepthAttacks
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
monocular-depth-estimation-on-eth3dDPT
Delta u003c 1.25: 0.0946
absolute relative error: 0.078
monocular-depth-estimation-on-kitti-eigenDPT-Hybrid
Delta u003c 1.25: 0.959
Delta u003c 1.25^2: 0.995
Delta u003c 1.25^3: 0.999
RMSE: 2.573
RMSE log: 0.092
absolute relative error: 0.062
monocular-depth-estimation-on-nyu-depth-v2DPT-Hybrid
Delta u003c 1.25: 0.904
Delta u003c 1.25^2: 0.988
Delta u003c 1.25^3: 0.994
RMSE: 0.357
absolute relative error: 0.110
log 10: 0.045
semantic-segmentation-on-ade20kDPT-Hybrid
Validation mIoU: 49.02
semantic-segmentation-on-ade20k-valDPT-Hybrid
Pixel Accuracy: 83.11
mIoU: 49.02
semantic-segmentation-on-pascal-contextDPT-Hybrid
mIoU: 60.46

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Vision Transformers for Dense Prediction | Papers | HyperAI