HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

End-to-end Video Gaze Estimation via Capturing Head-face-eye Spatial-temporal Interaction Context

Yiran Guan Zhuoguang Chen Wenzheng Zeng Zhiguo Cao Yang Xiao

End-to-end Video Gaze Estimation via Capturing Head-face-eye Spatial-temporal Interaction Context

Abstract

In this letter, we propose a new method, Multi-Clue Gaze (MCGaze), to facilitate video gaze estimation via capturing spatial-temporal interaction context among head, face, and eye in an end-to-end learning way, which has not been well concerned yet. The main advantage of MCGaze is that the tasks of clue localization of head, face, and eye can be solved jointly for gaze estimation in a one-step way, with joint optimization to seek optimal performance. During this, spatial-temporal context exchange happens among the clues on the head, face, and eye. Accordingly, the final gazes obtained by fusing features from various queries can be aware of global clues from heads and faces, and local clues from eyes simultaneously, which essentially leverages performance. Meanwhile, the one-step running way also ensures high running efficiency. Experiments on the challenging Gaze360 dataset verify the superiority of our proposition. The source code will be released at https://github.com/zgchen33/MCGaze.

Code Repositories

zgchen33/mcgaze
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
gaze-estimation-on-gaze360MCGaze
Angular Error: 10.02

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
End-to-end Video Gaze Estimation via Capturing Head-face-eye Spatial-temporal Interaction Context | Papers | HyperAI