Command Palette
Search for a command to run...
Poseidon: A ViT-based Architecture for Multi-Frame Pose Estimation with Adaptive Frame Weighting and Multi-Scale Feature Fusion
Pace Cesare Davide ; De Nunzio Alessandro Marco ; De Stefano Claudio ; Fontanella Francesco ; Molinara Mario

Abstract
Human pose estimation, a vital task in computer vision, involves detectingand localising human joints in images and videos. While single-frame poseestimation has seen significant progress, it often fails to capture thetemporal dynamics for understanding complex, continuous movements. We proposePoseidon, a novel multi-frame pose estimation architecture that extends theViTPose model by integrating temporal information for enhanced accuracy androbustness to address these limitations. Poseidon introduces key innovations:(1) an Adaptive Frame Weighting (AFW) mechanism that dynamically prioritisesframes based on their relevance, ensuring that the model focuses on the mostinformative data; (2) a Multi-Scale Feature Fusion (MSFF) module thataggregates features from different backbone layers to capture both fine-graineddetails and high-level semantics; and (3) a Cross-Attention module foreffective information exchange between central and contextual frames, enhancingthe model's temporal coherence. The proposed architecture improves performancein complex video scenarios and offers scalability and computational efficiencysuitable for real-world applications. Our approach achieves state-of-the-artperformance on the PoseTrack21 and PoseTrack18 datasets, achieving mAP scoresof 88.3 and 87.8, respectively, outperforming existing methods.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| 2d-human-pose-estimation-on-jhmdb-2d-poses | Poseidon | PCK: 97.3 |
| multi-person-pose-estimation-on-posetrack2018 | Poseidon | Mean mAP: 87.8 |
| multi-person-pose-estimation-on-posetrack21-1 | Poseidon | Mean mAP: 88.3 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.