| DVIS-DAQ(VIT-L, Offline) | 83.8 | 62.9 | - | - | 57.1 | DVIS-DAQ: Improving Video Segmentation via Dynamic Anchor Queries | |
| CAVIS(VIT-L, Offline) | 82.6 | 63.5 | 21.2 | 61.8 | 57.1 | Context-Aware Video Instance Segmentation | |
| DVIS(Swin-L, Offline) | 75.9 | 53.0 | 19.4 | 55.3 | 49.9 | DVIS: Decoupled Video Instance Segmentation Framework | |
| DVIS++(VIT-L, Online) | 72.5 | 55.0 | 20.8 | 54.6 | 49.6 | DVIS++: Improved Decoupled Framework for Universal Video Segmentation | |
| UNINEXT (ViT-H, Online) | 72.5 | 52.2 | - | - | 49.0 | Universal Instance Perception as Object Discovery and Retrieval | |
| DVIS(Swin-L, Online) | 71.9 | 49.2 | 19.4 | 52.5 | 47.1 | DVIS: Decoupled Video Instance Segmentation Framework | |
| RefineVIS (Swin-L, offline) | 70.4 | 48.4 | 19.1 | 51.2 | 46 | RefineVIS: Video Instance Segmentation with Temporal Attention Refinement | - |
| GenVIS (Swin-L) | 69.2 | 47.8 | 18.9 | 49.0 | 45.4 | A Generalized Framework for Video Instance Segmentation | |
| DVIS++(R50, Offline) | 68.9 | 40.9 | 16.8 | 47.3 | 41.2 | DVIS++: Improved Decoupled Framework for Universal Video Segmentation | |
| BoxVIS(Swin-L & Box-sup) | 68.4 | 39.9 | - | - | 40.6 | BoxVIS: Video Instance Segmentation with Box Annotations | |