HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

Affordance Grounding from Demonstration Video to Target Image

Joya Chen; Difei Gao; Kevin Qinghong Lin; Mike Zheng Shou

Affordance Grounding from Demonstration Video to Target Image

Abstract

Humans excel at learning from expert demonstrations and solving their own problems. To equip intelligent robots and assistants, such as AR glasses, with this ability, it is essential to ground human hand interactions (i.e., affordances) from demonstration videos and apply them to a target image like a user's AR glass view. The video-to-image affordance grounding task is challenging due to (1) the need to predict fine-grained affordances, and (2) the limited training data, which inadequately covers video-image discrepancies and negatively impacts grounding. To tackle them, we propose Affordance Transformer (Afformer), which has a fine-grained transformer-based decoder that gradually refines affordance grounding. Moreover, we introduce Mask Affordance Hand (MaskAHand), a self-supervised pre-training technique for synthesizing video-image data and simulating context changes, enhancing affordance grounding across video-image discrepancies. Afformer with MaskAHand pre-training achieves state-of-the-art performance on multiple benchmarks, including a substantial 37% improvement on the OPRA dataset. Code is made available at https://github.com/showlab/afformer.

Code Repositories

showlab/afformer
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
video-to-image-affordance-grounding-on-epicAfformer
AUC-J: 0.88
KLD: 0.97
SIM: 0.56
video-to-image-affordance-grounding-on-opraAfformer (ViTDet-B encoder)
KLD: 1.51
Top-1 Action Accuracy: 52.27
video-to-image-affordance-grounding-on-opraAfformer (ResNet-50-FPN encoder)
KLD: 1.55
Top-1 Action Accuracy: 52.14
video-to-image-affordance-grounding-on-opra-1Afformer
AUC-J: 0.89
KLD: 1.05
SIM: 0.53

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Affordance Grounding from Demonstration Video to Target Image | Papers | HyperAI