5 months ago

When Pedestrian Detection Meets Multi-Modal Learning: Generalist Model and Benchmark Dataset

Zhang Yi ; Zeng Wang ; Jin Sheng ; Qian Chen ; Luo Ping ; Liu Wentao

Abstract

Recent years have witnessed increasing research attention towards pedestriandetection by taking the advantages of different sensor modalities (e.g. RGB,IR, Depth, LiDAR and Event). However, designing a unified generalist model thatcan effectively process diverse sensor modalities remains a challenge. Thispaper introduces MMPedestron, a novel generalist model for multimodalperception. Unlike previous specialist models that only process one or a pairof specific modality inputs, MMPedestron is able to process multiple modalinputs and their dynamic combinations. The proposed approach comprises aunified encoder for modal representation and fusion and a general head forpedestrian detection. We introduce two extra learnable tokens, i.e. MAA andMAF, for adaptive multi-modal feature fusion. In addition, we construct theMMPD dataset, the first large-scale benchmark for multi-modal pedestriandetection. This benchmark incorporates existing public datasets and a newlycollected dataset called EventPed, covering a wide range of sensor modalitiesincluding RGB, IR, Depth, LiDAR, and Event data. With multi-modal jointtraining, our model achieves state-of-the-art performance on a wide range ofpedestrian detection benchmarks, surpassing leading models tailored forspecific sensor modality. For example, it achieves 71.1 AP on COCO-Persons and72.6 AP on LLVIP. Notably, our model achieves comparable performance to theInternImage-H model on CrowdHuman with 30x smaller parameters. Codes and dataare available at https://github.com/BubblyYi/MMPedestron.

Code Repositories

BubblyYi/MMPedestron

Official

pytorch

Mentioned in GitHub

Benchmarks

Benchmark	Methodology	Metrics
multispectral-object-detection-on-flir-1	MMPedestron	mAP50: 86.4%
object-detection-on-crowdhuman-full-body	MMPedestron	AP: 97.1 mMR: 30.8
object-detection-on-eventped	MMPedestron	AP: 79.0
object-detection-on-inoutdoor	MMPedestron	AP: 65.7
object-detection-on-stcrowd	MMPedestron	AP: 74.9
pedestrian-detection-on-llvip	MMPedestron	AP: 0.726
pedestrian-detection-on-mmpd-dataset	MMPedestron	box mAP: 79.0

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette