Command Palette
Search for a command to run...
ST-P3: End-to-end Vision-based Autonomous Driving via Spatial-Temporal Feature Learning
Hu Shengchao ; Chen Li ; Wu Penghao ; Li Hongyang ; Yan Junchi ; Tao Dacheng

Abstract
Many existing autonomous driving paradigms involve a multi-stage discretepipeline of tasks. To better predict the control signals and enhance usersafety, an end-to-end approach that benefits from joint spatial-temporalfeature learning is desirable. While there are some pioneering works onLiDAR-based input or implicit design, in this paper we formulate the problem inan interpretable vision-based setting. In particular, we propose aspatial-temporal feature learning scheme towards a set of more representativefeatures for perception, prediction and planning tasks simultaneously, which iscalled ST-P3. Specifically, an egocentric-aligned accumulation technique isproposed to preserve geometry information in 3D space before the bird's eyeview transformation for perception; a dual pathway modeling is devised to takepast motion variations into account for future prediction; a temporal-basedrefinement unit is introduced to compensate for recognizing vision-basedelements for planning. To the best of our knowledge, we are the first tosystematically investigate each part of an interpretable end-to-endvision-based autonomous driving system. We benchmark our approach againstprevious state-of-the-arts on both open-loop nuScenes dataset as well asclosed-loop CARLA simulation. The results show the effectiveness of our method.Source code, model and protocol details are made publicly available athttps://github.com/OpenPerceptionX/ST-P3.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| bird-s-eye-view-semantic-segmentation-on | ST-P3 | IoU ped - 224x480 - Vis filter. - 100x100 at 0.5: 14.5 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.