HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

SwinMTL: A Shared Architecture for Simultaneous Depth Estimation and Semantic Segmentation from Monocular Camera Images

Pardis Taghavi; Reza Langari; Gaurav Pandey

SwinMTL: A Shared Architecture for Simultaneous Depth Estimation and Semantic Segmentation from Monocular Camera Images

Abstract

This research paper presents an innovative multi-task learning framework that allows concurrent depth estimation and semantic segmentation using a single camera. The proposed approach is based on a shared encoder-decoder architecture, which integrates various techniques to improve the accuracy of the depth estimation and semantic segmentation task without compromising computational efficiency. Additionally, the paper incorporates an adversarial training component, employing a Wasserstein GAN framework with a critic network, to refine model's predictions. The framework is thoroughly evaluated on two datasets - the outdoor Cityscapes dataset and the indoor NYU Depth V2 dataset - and it outperforms existing state-of-the-art methods in both segmentation and depth estimation tasks. We also conducted ablation studies to analyze the contributions of different components, including pre-training strategies, the inclusion of critics, the use of logarithmic depth scaling, and advanced image augmentations, to provide a better understanding of the proposed framework. The accompanying source code is accessible at \url{https://github.com/PardisTaghavi/SwinMTL}.

Code Repositories

pardistaghavi/swinmtl
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
depth-estimation-on-cityscapes-testSwinMTL
RMSE: 6.352
monocular-depth-estimation-on-cityscapesSwinMTL
Absolute relative error (AbsRel): 0.089
RMSE: 5.481
RMSE log: 0.139
Square relative error (SqRel): 1.051
multi-task-learning-on-cityscapesSwinMTL
RMSE: 0.51
mIoU: 76.41
multi-task-learning-on-nyuv2SwinMTL
Mean IoU: 58.14
real-time-semantic-segmentation-on-cityscapesSwinMTL
mIoU: 76.41%
semantic-segmentation-on-cityscapesSwinMTL
Mean IoU (class): 76.41%
semantic-segmentation-on-cityscapes-valSwinMTL
mIoU: 76.41
semantic-segmentation-on-nyu-depth-v2SwinMTL
Mean IoU: 58.14%

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
SwinMTL: A Shared Architecture for Simultaneous Depth Estimation and Semantic Segmentation from Monocular Camera Images | Papers | HyperAI