HyperAIHyperAI

Command Palette

Search for a command to run...

4 months ago

Poseidon: A ViT-based Architecture for Multi-Frame Pose Estimation with Adaptive Frame Weighting and Multi-Scale Feature Fusion

Pace Cesare Davide ; De Nunzio Alessandro Marco ; De Stefano Claudio ; Fontanella Francesco ; Molinara Mario

Poseidon: A ViT-based Architecture for Multi-Frame Pose Estimation with
  Adaptive Frame Weighting and Multi-Scale Feature Fusion

Abstract

Human pose estimation, a vital task in computer vision, involves detectingand localising human joints in images and videos. While single-frame poseestimation has seen significant progress, it often fails to capture thetemporal dynamics for understanding complex, continuous movements. We proposePoseidon, a novel multi-frame pose estimation architecture that extends theViTPose model by integrating temporal information for enhanced accuracy androbustness to address these limitations. Poseidon introduces key innovations:(1) an Adaptive Frame Weighting (AFW) mechanism that dynamically prioritisesframes based on their relevance, ensuring that the model focuses on the mostinformative data; (2) a Multi-Scale Feature Fusion (MSFF) module thataggregates features from different backbone layers to capture both fine-graineddetails and high-level semantics; and (3) a Cross-Attention module foreffective information exchange between central and contextual frames, enhancingthe model's temporal coherence. The proposed architecture improves performancein complex video scenarios and offers scalability and computational efficiencysuitable for real-world applications. Our approach achieves state-of-the-artperformance on the PoseTrack21 and PoseTrack18 datasets, achieving mAP scoresof 88.3 and 87.8, respectively, outperforming existing methods.

Code Repositories

CesareDavidePace/poseidon
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
2d-human-pose-estimation-on-jhmdb-2d-posesPoseidon
PCK: 97.3
multi-person-pose-estimation-on-posetrack2018Poseidon
Mean mAP: 87.8
multi-person-pose-estimation-on-posetrack21-1Poseidon
Mean mAP: 88.3

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Poseidon: A ViT-based Architecture for Multi-Frame Pose Estimation with Adaptive Frame Weighting and Multi-Scale Feature Fusion | Papers | HyperAI