HyperAIHyperAI

Command Palette

Search for a command to run...

4 months ago

ENTER: Event Based Interpretable Reasoning for VideoQA

ENTER: Event Based Interpretable Reasoning for VideoQA

Abstract

In this paper, we present ENTER, an interpretable Video Question Answering(VideoQA) system based on event graphs. Event graphs convert videos intographical representations, where video events form the nodes and event-eventrelationships (temporal/causal/hierarchical) form the edges. This structuredrepresentation offers many benefits: 1) Interpretable VideoQA via generatedcode that parses event-graph; 2) Incorporation of contextual visual informationin the reasoning process (code generation) via event graphs; 3) Robust VideoQAvia Hierarchical Iterative Update of the event graphs. Existing interpretableVideoQA systems are often top-down, disregarding low-level visual informationin the reasoning plan generation, and are brittle. While bottom-up approachesproduce responses from visual data, they lack interpretability. Experimentalresults on NExT-QA, IntentQA, and EgoSchema demonstrate that not only does ourmethod outperform existing top-down approaches while obtaining competitiveperformance against bottom-up approaches, but more importantly, offers superiorinterpretability and explainability in the reasoning process.

Benchmarks

BenchmarkMethodologyMetrics
zero-shot-video-question-answer-on-intentqaENTER
Accuracy: 71.5
zero-shot-video-question-answer-on-next-qaENTER
Accuracy: 75.1

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
ENTER: Event Based Interpretable Reasoning for VideoQA | Papers | HyperAI