HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

WildQA: In-the-Wild Video Question Answering

Santiago Castro Naihao Deng Pingxuan Huang Mihai Burzo Rada Mihalcea

WildQA: In-the-Wild Video Question Answering

Abstract

Existing video understanding datasets mostly focus on human interactions, with little attention being paid to the "in the wild" settings, where the videos are recorded outdoors. We propose WILDQA, a video understanding dataset of videos recorded in outside settings. In addition to video question answering (Video QA), we also introduce the new task of identifying visual support for a given question and answer (Video Evidence Selection). Through evaluations using a wide range of baseline models, we show that WILDQA poses new challenges to the vision and language research communities. The dataset is available at https://lit.eecs.umich.edu/wildqa/.

Benchmarks

BenchmarkMethodologyMetrics
video-question-answering-on-wildqaT5 (text + video)
ROUGE-1: 33.1 ± 0.3
ROUGE-2: 17.3 ± 0.4
ROUGE-L: 31.9 ± 0.2
video-question-answering-on-wildqaT5 (text)
ROUGE-1: 33.8 ± 0.2
ROUGE-2: 17.7 ± 0.1
ROUGE-L: 32.4 ± 0.3
video-question-answering-on-wildqaMulti (text + video, IO)
ROUGE-1: 34.0 ± 0.5
ROUGE-2: 18.8 ± 0.7
ROUGE-L: 32.8 ± 0.6
video-question-answering-on-wildqaT5 (text, zero-shot)
ROUGE-1: 0.8 ± 0.0
ROUGE-2: 0.0 ± 0.0
ROUGE-L: 0.8 ± 0.0
video-question-answering-on-wildqaMulti (text + video, SE)
ROUGE-1: 33.8 ± 0.8
ROUGE-2: 18.5 ± 0.7
ROUGE-L: 32.5 ± 0.8

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
WildQA: In-the-Wild Video Question Answering | Papers | HyperAI