Command Palette
Search for a command to run...
Video Question Answering
Video Question Answering (VQA) is a task that integrates computer vision and natural language processing technologies, aiming to accurately answer questions posed by users related to video content through the analysis of the video. Its goal is to achieve a deep fusion and understanding of visual and linguistic information in videos, thereby providing precise and efficient information retrieval and interactive experiences. VQA has significant application value in areas such as intelligent video assistants, educational platforms, and entertainment systems.
NExT-QA
LLaMA-VQA (33B)
ActivityNet-QA
FrozenBiLM
TVBench
Tarsier-34B
MVBench
ST-LLM
STAR Benchmark
VLAP (4 frames)
MSRVTT-QA
FrozenBiLM
How2QA
Text + Text (no Multimodal Pretext Training)
AGQA 2.0 balanced
GF (sup) - Faster RCNN
iVQA
FrozenBiLM
MSRVTT-MC
Singularity-temporal
TVQA
LLaMA-VQA
IntentQA
VideoChat2_mistral
Perception Test
InternVideo2 (8B)
SUTD-TrafficQA
WildQA
RoadTextVQA
GIT
NExT-QA (Efficient)
ViLA (3B, 4 frames)
LSMDC-MC
VIOLETv2
VideoQA
Just Ask (fine-tune)
MSVD-QA
MSR-VTT-MC
ATP (1<-16)
VLEP
Howto100M-QA
TimeSformer
DramaQA
TGIF-QA
MSR-VTT
LSMDC-FiB
Clover