Referring Video Object Segmentation On Revos
评估指标
F
J
Ju0026F
R
评测结果
各个模型在此基准测试上的表现结果
| Paper Title | Repository | |||||
|---|---|---|---|---|---|---|
| VRS-HQ (Chat-UniVi-13B) | 62.5 | 57.6 | 60 | 18.9 | The Devil is in Temporal Token: High Quality Video Reasoning Segmentation | |
| VRS-HQ (Chat-UniVi-7B) | 61.6 | 56.6 | 59.1 | 19.7 | The Devil is in Temporal Token: High Quality Video Reasoning Segmentation | |
| VISA (Chat-UniVi-13B) | 52.9 | 48.8 | 50.9 | 14.5 | VISA: Reasoning Video Object Segmentation via Large Language Models | |
| VISA (Chat-UniVi-7B) | 49.0 | 44.9 | 46.9 | 15.5 | VISA: Reasoning Video Object Segmentation via Large Language Models | |
| TrackGPT (LLaVA-13B) | 46.8 | 43.2 | 45.0 | 12.8 | Tracking with Human-Intent Reasoning | |
| LISA (LLaVA-13B) | 43.5 | 39.8 | 41.6 | 8.6 | LISA: Reasoning Segmentation via Large Language Model | |
| LMPM (Swin-T) | 31.7 | 21.2 | 26.4 | 3.2 | MeViS: A Large-scale Benchmark for Video Segmentation with Motion Expressions | |
| ReferFormer (Video-Swin-B) | 29.9 | 26.2 | 28.1 | 8.8 | Language as Queries for Referring Video Object Segmentation | |
| MTTR (Video-Swin-T) | 25.9 | 25.1 | 25.5 | 5.6 | End-to-End Referring Video Object Segmentation with Multimodal Transformers | 
0 of 9 row(s) selected.