Command Palette
Search for a command to run...
When Pedestrian Detection Meets Multi-Modal Learning: Generalist Model and Benchmark Dataset
Zhang Yi ; Zeng Wang ; Jin Sheng ; Qian Chen ; Luo Ping ; Liu Wentao

Abstract
Recent years have witnessed increasing research attention towards pedestriandetection by taking the advantages of different sensor modalities (e.g. RGB,IR, Depth, LiDAR and Event). However, designing a unified generalist model thatcan effectively process diverse sensor modalities remains a challenge. Thispaper introduces MMPedestron, a novel generalist model for multimodalperception. Unlike previous specialist models that only process one or a pairof specific modality inputs, MMPedestron is able to process multiple modalinputs and their dynamic combinations. The proposed approach comprises aunified encoder for modal representation and fusion and a general head forpedestrian detection. We introduce two extra learnable tokens, i.e. MAA andMAF, for adaptive multi-modal feature fusion. In addition, we construct theMMPD dataset, the first large-scale benchmark for multi-modal pedestriandetection. This benchmark incorporates existing public datasets and a newlycollected dataset called EventPed, covering a wide range of sensor modalitiesincluding RGB, IR, Depth, LiDAR, and Event data. With multi-modal jointtraining, our model achieves state-of-the-art performance on a wide range ofpedestrian detection benchmarks, surpassing leading models tailored forspecific sensor modality. For example, it achieves 71.1 AP on COCO-Persons and72.6 AP on LLVIP. Notably, our model achieves comparable performance to theInternImage-H model on CrowdHuman with 30x smaller parameters. Codes and dataare available at https://github.com/BubblyYi/MMPedestron.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| multispectral-object-detection-on-flir-1 | MMPedestron | mAP50: 86.4% |
| object-detection-on-crowdhuman-full-body | MMPedestron | AP: 97.1 mMR: 30.8 |
| object-detection-on-eventped | MMPedestron | AP: 79.0 |
| object-detection-on-inoutdoor | MMPedestron | AP: 65.7 |
| object-detection-on-stcrowd | MMPedestron | AP: 74.9 |
| pedestrian-detection-on-llvip | MMPedestron | AP: 0.726 |
| pedestrian-detection-on-mmpd-dataset | MMPedestron | box mAP: 79.0 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.