Command Palette
Search for a command to run...
Zero-Shot Video Question Answer
The Zero-Shot Video Question Answering task aims to enable large language models to accurately answer questions about video content without specific training. This task falls under the domain of computer vision and enhances the model's cross-modal understanding capabilities, allowing for immediate analysis and response to unseen video data. It has significant application value, especially in intelligent dialogue systems, video content retrieval, and automatic question answering scenarios.
MSRVTT-QA
MovieChat
ActivityNet-QA
MovieChat
MSVD-QA
BT-Adapter (zero-shot)
EgoSchema (fullset)
BIMBA-LLaVA-Qwen2-7B
NExT-QA
Tarsier (34B)
TGIF-QA
IG-VLM
IntentQA
IG-VLM
EgoSchema (subset)
Tarsier (34B)
Video-MME
Gemini 1.5 Pro
TVQA
FrozenBiLM (with speech)
Video-MME (w/o subs)
Video-RAG (based on LLaVA-Video)
NExT-GQA
Zero-shot Video Question Answering on LongVideoBench
Gemini 1.5 Pro
STAR Benchmark
VideoChat2
MVBench
TS-LLaVA-34B
CinePile: A Long Video Question Answering Dataset and Benchmark