6 months ago

Viorica Pătrăucean Lucas Smaira Ankush Gupta Adrià Recasens Continente Larisa Markeeva Dylan Banarse Skanda Koppula Joseph Heyward Mateusz Malinowski Yi Yang

Abstract

We propose a novel multimodal video benchmark - the Perception Test - to evaluate the perception and reasoning skills of pre-trained multimodal models (e.g. Flamingo, SeViLA, or GPT-4). Compared to existing benchmarks that focus on computational tasks (e.g. classification, detection or tracking), the Perception Test focuses on skills (Memory, Abstraction, Physics, Semantics) and types of reasoning (descriptive, explanatory, predictive, counterfactual) across video, audio, and text modalities, to provide a comprehensive and efficient evaluation tool. The benchmark probes pre-trained models for their transfer capabilities, in a zero-shot / few-shot or limited finetuning regime. For these purposes, the Perception Test introduces 11.6k real-world videos, 23s average length, designed to show perceptually interesting situations, filmed by around 100 participants worldwide. The videos are densely annotated with six types of labels (multiple-choice and grounded video question-answers, object and point tracks, temporal action and sound segments), enabling both language and non-language evaluations. The fine-tuning and validation splits of the benchmark are publicly available (CC-BY license), in addition to a challenge server with a held-out test split. Human baseline results compared to state-of-the-art video QA models show a substantial gap in performance (91.4% vs 46.2%), suggesting that there is significant room for improvement in multimodal video understanding. Dataset, baseline code, and challenge server are available at https://github.com/deepmind/perception_test

Source PDF

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

6 months ago

Visual Question Answering

Viorica Pătrăucean Lucas Smaira Ankush Gupta Adrià Recasens Continente Larisa Markeeva Dylan Banarse Skanda Koppula Joseph Heyward Mateusz Malinowski Yi Yang

Abstract

Source PDF

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

HyperAI

6 months ago

Visual Question Answering

Viorica Pătrăucean Lucas Smaira Ankush Gupta Adrià Recasens Continente Larisa Markeeva Dylan Banarse Skanda Koppula Joseph Heyward Mateusz Malinowski Yi Yang

Abstract

Source PDF

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Perception Test: A Diagnostic Benchmark for Multimodal Video Models | Papers | HyperAI

Command Palette

Perception Test: A Diagnostic Benchmark for Multimodal Video Models

Viorica Pătrăucean Lucas Smaira Ankush Gupta Adrià Recasens Continente Larisa Markeeva Dylan Banarse Skanda Koppula Joseph Heyward Mateusz Malinowski Yi Yang14 more

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

Perception Test: A Diagnostic Benchmark for Multimodal Video Models

Viorica Pătrăucean Lucas Smaira Ankush Gupta Adrià Recasens Continente Larisa Markeeva Dylan Banarse Skanda Koppula Joseph Heyward Mateusz Malinowski Yi Yang14 more

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

Perception Test: A Diagnostic Benchmark for Multimodal Video Models

Viorica Pătrăucean Lucas Smaira Ankush Gupta Adrià Recasens Continente Larisa Markeeva Dylan Banarse Skanda Koppula Joseph Heyward Mateusz Malinowski Yi Yang14 more

Abstract

Build AI with AI

HyperAI Newsletters

Viorica Pătrăucean Lucas Smaira Ankush Gupta Adrià Recasens Continente Larisa Markeeva Dylan Banarse Skanda Koppula Joseph Heyward Mateusz Malinowski Yi Yang

Viorica Pătrăucean Lucas Smaira Ankush Gupta Adrià Recasens Continente Larisa Markeeva Dylan Banarse Skanda Koppula Joseph Heyward Mateusz Malinowski Yi Yang

Viorica Pătrăucean Lucas Smaira Ankush Gupta Adrià Recasens Continente Larisa Markeeva Dylan Banarse Skanda Koppula Joseph Heyward Mateusz Malinowski Yi Yang