Command Palette
Search for a command to run...
Video Captioning On Tvc
Metrics
BLEU-4
CIDEr
Results
Performance results of various models on this benchmark
| Paper Title | Repository | |||
|---|---|---|---|---|
| VAST | 19.9 | 74.1 | VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset | |
| COSA | 18.8 | 70.7 | COSA: Concatenated Sample Pretrained Vision-Language Foundation Model |
0 of 2 row(s) selected.