Command Palette
Search for a command to run...
Multiview Detection with Shadow Transformer (and View-Coherent Data Augmentation)
Hou Yunzhong ; Zheng Liang

Abstract
Multiview detection incorporates multiple camera views to deal withocclusions, and its central problem is multiview aggregation. Given feature mapprojections from multiple views onto a common ground plane, thestate-of-the-art method addresses this problem via convolution, which appliesthe same calculation regardless of object locations. However, suchtranslation-invariant behaviors might not be the best choice, as objectfeatures undergo various projection distortions according to their positionsand cameras. In this paper, we propose a novel multiview detector, MVDeTr, thatadopts a newly introduced shadow transformer to aggregate multiviewinformation. Unlike convolutions, shadow transformer attends differently atdifferent positions and cameras to deal with various shadow-like distortions.We propose an effective training scheme that includes a new view-coherent dataaugmentation method, which applies random augmentations while maintainingmultiview consistency. On two multiview detection benchmarks, we report newstate-of-the-art accuracy with the proposed system. Code is available athttps://github.com/hou-yz/MVDeTr.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| multiview-detection-on-citystreet | MVDeTr | F1_score (2m): 75.2 MODA (2m): 58.3 MODP (2m): 74.1 Precision (2m): 92.8 Recall (2m): 63.2 |
| multiview-detection-on-cvcs | MVDeTr | F1_score (1m): 61.0 MODA (1m): 39.8 MODP (1m): 84.1 Precision (1m): 95.3 Recall (1m): 44.9 |
| multiview-detection-on-multiviewx | MVDeTr | MODA: 93.7 MODP: 91.3 Recall: 94.2 |
| multiview-detection-on-wildtrack | MVDeTr | MODA: 91.5 MODP: 82.1 Recall: 94.0 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.