Dense Video Captioning On Youcook2
评估指标
ROUGE-L
评测结果
各个模型在此基准测试上的表现结果
| Paper Title | Repository | ||
|---|---|---|---|
| E2vidD6-MASSalign-BiD | 39.03 | Multimodal Pretraining for Dense Video Captioning | |
| Vid2Seq (HowTo100M+VidChapters-7M PT) | - | - | - |
| CM² | - | Do You Remember? Dense Video Captioning with Cross-Modal Memory Retrieval | |
| PDVC (TSN features, no SCST) | - | End-to-End Dense Video Captioning with Parallel Decoding | |
| HiCM² | - | HiCM$^2$: Hierarchical Compact Memory Modeling for Dense Video Captioning | |
| GVL | - | Learning Grounded Vision-Language Representation for Versatile Understanding in Untrimmed Videos | |
| Vid2Seq | - | Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning |
0 of 7 row(s) selected.