HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

Long Story Short: a Summarize-then-Search Method for Long Video Question Answering

Jiwan Chung; Youngjae Yu

Long Story Short: a Summarize-then-Search Method for Long Video Question Answering

Abstract

Large language models such as GPT-3 have demonstrated an impressive capability to adapt to new tasks without requiring task-specific training data. This capability has been particularly effective in settings such as narrative question answering, where the diversity of tasks is immense, but the available supervision data is small. In this work, we investigate if such language models can extend their zero-shot reasoning abilities to long multimodal narratives in multimedia content such as drama, movies, and animation, where the story plays an essential role. We propose Long Story Short, a framework for narrative video QA that first summarizes the narrative of the video to a short plot and then searches parts of the video relevant to the question. We also propose to enhance visual matching with CLIPCheck. Our model outperforms state-of-the-art supervised models by a large margin, highlighting the potential of zero-shot QA for long videos.

Code Repositories

Benchmarks

BenchmarkMethodologyMetrics
video-story-qa-on-movieqaLong Story Short
Accuracy: 51.49

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Long Story Short: a Summarize-then-Search Method for Long Video Question Answering | Papers | HyperAI