HyperAIHyperAI

Command Palette

Search for a command to run...

4 months ago

CapeLLM: Support-Free Category-Agnostic Pose Estimation with Multimodal Large Language Models

Kim Junho ; Chung Hyungjin ; Kim Byung-Hoon

CapeLLM: Support-Free Category-Agnostic Pose Estimation with Multimodal
  Large Language Models

Abstract

Category-agnostic pose estimation (CAPE) has traditionally relied on supportimages with annotated keypoints, a process that is often cumbersome and mayfail to fully capture the necessary correspondences across diverse objectcategories. Recent efforts have begun exploring the use of text-based queries,where the need for support keypoints is eliminated. However, the optimal use oftextual descriptions for keypoints remains an underexplored area. In this work,we introduce CapeLLM, a novel approach that leverages a text-based multimodallarge language model (MLLM) for CAPE. Our method only employs query image anddetailed text descriptions as an input to estimate category-agnostic keypoints.We conduct extensive experiments to systematically explore the design space ofLLM-based CAPE, investigating factors such as choosing the optimal descriptionfor keypoints, neural network architectures, and training strategies. Thanks tothe advanced reasoning capabilities of the pre-trained MLLM, CapeLLMdemonstrates superior generalization and robust performance. Our approach setsa new state-of-the-art on the MP-100 benchmark in the challenging 1-shotsetting, marking a significant advancement in the field of category-agnosticpose estimation.

Benchmarks

BenchmarkMethodologyMetrics
category-agnostic-pose-estimation-on-mp100CapeLLM
Mean PCK@0.2 - 1shot: 92.60

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
CapeLLM: Support-Free Category-Agnostic Pose Estimation with Multimodal Large Language Models | Papers | HyperAI