Command Palette
Search for a command to run...
Exploring Efficiency of Vision Transformers for Self-Supervised Monocular Depth Estimation
{Ilya Makarov Aleksei Karpov}
Abstract
Depth estimation is a crucial task for the creation of depth maps, one of the most important components for augmented reality (AR) and other applications. However, the most widely used hardware for AR and smartphones has only sparse depth sensors with different ground truth depth acquisition methods. Thus, depth estimation models that are robust for downstream AR tasks performance can only be trained reliably using self-supervised learning based on camera information. Previous works in the field mostly focus on self-supervised models with pure convolutional architectures, without taking global spatial context into account.In this paper, we utilize vision transformer architectures for self-supervised monocular depth estimation and propose VTDepth, a vision transformer-based model, which provides a solution to the problem of the global spatial context. We compare various combinations of convolutional and transformer architectures for self-supervised depth estimation and show that the best combination of models is an encoder with a transformer basis and convolutional decoder. Our experiments demonstrate the efficiency of VTDepth for self-supervised depth estimation. Our set of models achieves state-of-the-art performance for self-supervised learning on NYUv2 and KITTI datasets. Our code is available at https://github.com/ahbpp/VTDepth.
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| monocular-depth-estimation-on-kitti-eigen-1 | VTDepthB2 (stereo supervision) | Delta u003c 1.25: 0.904 Delta u003c 1.25^2: 0.965 Delta u003c 1.25^3: 0.983 RMSE: 4.439 RMSE log: 0.178 Sq Rel: 0.743 absolute relative error: 0.099 |
| monocular-depth-estimation-on-kitti-eigen-1 | VTDepthB2 (monocular supervision) | Delta u003c 1.25: 0.893 Delta u003c 1.25^2: 0.964 Delta u003c 1.25^3: 0.983 RMSE: 4.530 RMSE log: 0.182 Sq Rel: 0.762 absolute relative error: 0.105 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.