Referring Expression Segmentation On Refer 1

评估指标

F
J
Ju0026F

评测结果

各个模型在此基准测试上的表现结果

Paper TitleRepository
MPG-SAM 276.171.773.9MPG-SAM 2: Adapting SAM 2 with Mask Priors and Global Context for Referring Video Object Segmentation
VRS-HQ (Chat-UniVi-13B)73.16971The Devil is in Temporal Token: High Quality Video Reasoning Segmentation
GLEE-Pro72.968.270.6General Object Foundation Model for Images and Videos at Scale
UNINEXT-H72.767.670.1Universal Instance Perception as Object Discovery and Retrieval
ReferDINO (Swin-B)71.5 67.069.3ReferDINO: Referring Video Object Segmentation with Visual Grounding Foundations-
MUTR70.466.468.4Referred by Multi-Modality: A Unified Temporal Transformer for Video Object Segmentation
VLP (VLMo-L)69.8 65.3 67.6 Harnessing Vision-Language Pretrained Models with Temporal-Aware Adaptation for Referring Video Object Segmentation-
UniRef-L (Swin-L)69.265.567.4Segment Every Reference Object in Spatial and Temporal Spaces-
SOC (Joint training, Video-Swin-B)69.365.367.3±0.5SOC: Semantic-Assisted Object Cluster for Referring Video Object Segmentation
HTR (Pre-training)68.965.367.1Temporally Consistent Referring Video Object Segmentation with Hybrid Memory
DsHmp (Video-Swin-Base)69.16567.1Decoupling Static and Hierarchical Motion Perception for Referring Video Segmentation
UniRef++-L69.064.866.9UniRef++: Segment Every Reference Object in Spatial and Temporal Spaces
ViLLa68.664.666.5ViLLa: Video Reasoning Segmentation with Large Language Model
DEVA (ReferFormer)--66.0Tracking Anything with Decoupled Video Segmentation
SgMg (Pre-training)67.463.965.7Spectrum-guided Multi-granularity Referring Video Object Segmentation
GroPrompt66.964.165.5GroPrompt: Efficient Grounded Prompting and Adaptation for Referring Video Object Segmentation-
EPCFormer (ViT-H)67.262.965Expression Prompt Collaboration Transformer for Universal Referring Video Object Segmentation-
UniLSeg-10067.062.864.9Universal Segmentation at Arbitrary Granularity with Language Instruction
LoSh-R66.062.564.2LoSh: Long-Short Text Joint Prediction Network for Referring Video Object Segmentation
VLT65.661.963.8VLT: Vision-Language Transformer and Query Generation for Referring Segmentation
0 of 33 row(s) selected.
Referring Expression Segmentation On Refer 1 | SOTA | HyperAI超神经