HyperAI
HyperAI超神经
首页
算力平台
文档
资讯
论文
教程
数据集
百科
SOTA
LLM 模型天梯
GPU 天梯
顶会
开源项目
全站搜索
关于
中文
HyperAI
HyperAI超神经
Toggle sidebar
全站搜索…
⌘
K
全站搜索…
⌘
K
首页
SOTA
参照表达分割
Referring Expression Segmentation On Refer 1
Referring Expression Segmentation On Refer 1
评估指标
F
J
Ju0026F
评测结果
各个模型在此基准测试上的表现结果
Columns
模型名称
F
J
Ju0026F
Paper Title
Repository
MPG-SAM 2
76.1
71.7
73.9
MPG-SAM 2: Adapting SAM 2 with Mask Priors and Global Context for Referring Video Object Segmentation
VRS-HQ (Chat-UniVi-13B)
73.1
69
71
The Devil is in Temporal Token: High Quality Video Reasoning Segmentation
GLEE-Pro
72.9
68.2
70.6
General Object Foundation Model for Images and Videos at Scale
UNINEXT-H
72.7
67.6
70.1
Universal Instance Perception as Object Discovery and Retrieval
ReferDINO (Swin-B)
71.5
67.0
69.3
ReferDINO: Referring Video Object Segmentation with Visual Grounding Foundations
-
MUTR
70.4
66.4
68.4
Referred by Multi-Modality: A Unified Temporal Transformer for Video Object Segmentation
VLP (VLMo-L)
69.8
65.3
67.6
Harnessing Vision-Language Pretrained Models with Temporal-Aware Adaptation for Referring Video Object Segmentation
-
UniRef-L (Swin-L)
69.2
65.5
67.4
Segment Every Reference Object in Spatial and Temporal Spaces
-
SOC (Joint training, Video-Swin-B)
69.3
65.3
67.3±0.5
SOC: Semantic-Assisted Object Cluster for Referring Video Object Segmentation
HTR (Pre-training)
68.9
65.3
67.1
Temporally Consistent Referring Video Object Segmentation with Hybrid Memory
DsHmp (Video-Swin-Base)
69.1
65
67.1
Decoupling Static and Hierarchical Motion Perception for Referring Video Segmentation
UniRef++-L
69.0
64.8
66.9
UniRef++: Segment Every Reference Object in Spatial and Temporal Spaces
ViLLa
68.6
64.6
66.5
ViLLa: Video Reasoning Segmentation with Large Language Model
DEVA (ReferFormer)
-
-
66.0
Tracking Anything with Decoupled Video Segmentation
SgMg (Pre-training)
67.4
63.9
65.7
Spectrum-guided Multi-granularity Referring Video Object Segmentation
GroPrompt
66.9
64.1
65.5
GroPrompt: Efficient Grounded Prompting and Adaptation for Referring Video Object Segmentation
-
EPCFormer (ViT-H)
67.2
62.9
65
Expression Prompt Collaboration Transformer for Universal Referring Video Object Segmentation
-
UniLSeg-100
67.0
62.8
64.9
Universal Segmentation at Arbitrary Granularity with Language Instruction
LoSh-R
66.0
62.5
64.2
LoSh: Long-Short Text Joint Prediction Network for Referring Video Object Segmentation
VLT
65.6
61.9
63.8
VLT: Vision-Language Transformer and Query Generation for Referring Segmentation
0 of 33 row(s) selected.
Previous
Next
Referring Expression Segmentation On Refer 1 | SOTA | HyperAI超神经