HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

When Pedestrian Detection Meets Multi-Modal Learning: Generalist Model and Benchmark Dataset

Zhang Yi ; Zeng Wang ; Jin Sheng ; Qian Chen ; Luo Ping ; Liu Wentao

When Pedestrian Detection Meets Multi-Modal Learning: Generalist Model
  and Benchmark Dataset

Abstract

Recent years have witnessed increasing research attention towards pedestriandetection by taking the advantages of different sensor modalities (e.g. RGB,IR, Depth, LiDAR and Event). However, designing a unified generalist model thatcan effectively process diverse sensor modalities remains a challenge. Thispaper introduces MMPedestron, a novel generalist model for multimodalperception. Unlike previous specialist models that only process one or a pairof specific modality inputs, MMPedestron is able to process multiple modalinputs and their dynamic combinations. The proposed approach comprises aunified encoder for modal representation and fusion and a general head forpedestrian detection. We introduce two extra learnable tokens, i.e. MAA andMAF, for adaptive multi-modal feature fusion. In addition, we construct theMMPD dataset, the first large-scale benchmark for multi-modal pedestriandetection. This benchmark incorporates existing public datasets and a newlycollected dataset called EventPed, covering a wide range of sensor modalitiesincluding RGB, IR, Depth, LiDAR, and Event data. With multi-modal jointtraining, our model achieves state-of-the-art performance on a wide range ofpedestrian detection benchmarks, surpassing leading models tailored forspecific sensor modality. For example, it achieves 71.1 AP on COCO-Persons and72.6 AP on LLVIP. Notably, our model achieves comparable performance to theInternImage-H model on CrowdHuman with 30x smaller parameters. Codes and dataare available at https://github.com/BubblyYi/MMPedestron.

Code Repositories

BubblyYi/MMPedestron
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
multispectral-object-detection-on-flir-1MMPedestron
mAP50: 86.4%
object-detection-on-crowdhuman-full-bodyMMPedestron
AP: 97.1
mMR: 30.8
object-detection-on-eventpedMMPedestron
AP: 79.0
object-detection-on-inoutdoorMMPedestron
AP: 65.7
object-detection-on-stcrowdMMPedestron
AP: 74.9
pedestrian-detection-on-llvipMMPedestron
AP: 0.726
pedestrian-detection-on-mmpd-datasetMMPedestron
box mAP: 79.0

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
When Pedestrian Detection Meets Multi-Modal Learning: Generalist Model and Benchmark Dataset | Papers | HyperAI