HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Transformer-Based Unified Recognition of Two Hands Manipulating Objects

{Seungryul Baek Elkhan Ismayilzada Seongyeong Lee Jihyeon Kim Chanwoo Kim Hoseong Cho}

Transformer-Based Unified Recognition of Two Hands Manipulating Objects

Abstract

Understanding the hand-object interactions from an egocentric video has received a great attention recently. So far, most approaches are based on the convolutional neural network (CNN) features combined with the temporal encoding via the long short-term memory (LSTM) or graph convolution network (GCN) to provide the unified understanding of two hands, an object and their interactions. In this paper, we propose the Transformer-based unified framework that provides better understanding of two hands manipulating objects. In our framework, we insert the whole image depicting two hands, an object and their interactions as input and jointly estimate 3 information from each frame: poses of two hands, pose of an object and object types. Afterwards, the action class defined by the hand-object interactions is predicted from the entire video based on the estimated information combined with the contact map that encodes the interaction between two hands and an object. Experiments are conducted on H2O and FPHA benchmark datasets and we demonstrated the superiority of our method achieving the state-of-the-art accuracy. Ablative studies further demonstrate the effectiveness of each proposed module.

Benchmarks

BenchmarkMethodologyMetrics
action-recognition-on-h2o-2-hands-and-objectsH2OTR
Actions Top-1: 90.90
Hand Pose: 3D (est.)
Object Label: Yes (est.)
Object Pose: Yes (est.)
RGB: Yes

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Transformer-Based Unified Recognition of Two Hands Manipulating Objects | Papers | HyperAI