HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

HSPFormer: Hierarchical Spatial Perception Transformer for Semantic Segmentation

{Guorong Cai Zongyue Wang Yiping Chen Ruisheng Wang Jinhe Su Changshe Zhang Ting Han Siyu Chen}

Abstract

Semantic perception in driving scenarios plays a crucial role in intelligent transportation systems. However, existing Transformer-based semantic segmentation methods often do not fully exploit their potential in understanding driving scene dynamically. These methods typically lack spatial reasoning, failing to effectively correlate image pixels with their spatial positions, leading to attention drift. To address this issue, we propose a novel architecture, the Hierarchical Spatial Perception Transformer (HSPFormer), which integrates monocular depth estimation and semantic segmentation into a unified framework for the first time. We introduce the Spatial Depth Perception Auxiliary Network (SDPNet), a framework for multiscale feature extraction and multilayer depth map prediction to establish hierarchical spatial coherence. Additionally, we design the Hierarchical Pyramid Transformer Network (HPTNet), which uses depth estimation as learnable position embeddings to form spatially correlated semantic representations and generate global contextual information. Experiments on benchmark datasets such as KITTI-360, Cityscapes, and NYU Depth V2, demonstrate that HSPFormer outperforms several state-of-the-art networks, and achieves promising performance with 66.82% top-1 mIoU on KITTI-360, 83.8% mIoU on Cityscapes, and 57.7% mIoU on NYU Depth V2, respectively. The code will be made publicly available at https://github.com/SY-Ch/HSPFormer.

Benchmarks

BenchmarkMethodologyMetrics
semantic-segmentation-on-kitti-360HSPFormer-DBS(RGB-Depth)
mIoU: 67.32
semantic-segmentation-on-kitti-360HSPFormer-UFS(RGB)
mIoU: 66.82
semantic-segmentation-on-nyu-depth-v2HSPFormer(PVT v2-B4)
Mean IoU: 57.8%

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
HSPFormer: Hierarchical Spatial Perception Transformer for Semantic Segmentation | Papers | HyperAI