Command Palette
Search for a command to run...
Hierarchical Temporal Context Learning for Camera-based Semantic Scene Completion
Li Bohan ; Deng Jiajun ; Zhang Wenyao ; Liang Zhujin ; Du Dalong ; Jin Xin ; Zeng Wenjun

Abstract
Camera-based 3D semantic scene completion (SSC) is pivotal for predictingcomplicated 3D layouts with limited 2D image observations. The existingmainstream solutions generally leverage temporal information by roughlystacking history frames to supplement the current frame, such straightforwardtemporal modeling inevitably diminishes valid clues and increases learningdifficulty. To address this problem, we present HTCL, a novel HierarchicalTemporal Context Learning paradigm for improving camera-based semantic scenecompletion. The primary innovation of this work involves decomposing temporalcontext learning into two hierarchical steps: (a) cross-frame affinitymeasurement and (b) affinity-based dynamic refinement. Firstly, to separatecritical relevant context from redundant information, we introduce the patternaffinity with scale-aware isolation and multiple independent learners forfine-grained contextual correspondence modeling. Subsequently, to dynamicallycompensate for incomplete observations, we adaptively refine the featuresampling locations based on initially identified locations with high affinityand their neighboring relevant regions. Our method ranks $1^{st}$ on theSemanticKITTI benchmark and even surpasses LiDAR-based methods in terms of mIoUon the OpenOccupancy benchmark. Our code is available onhttps://github.com/Arlo0o/HTCL.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| 3d-semantic-scene-completion-on-semantickitti | HTCL-S | mIoU: 17.09 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.