Referring Expression Segmentation On A2D

评估指标

AP
IoU mean
IoU overall
Precision@0.5
Precision@0.6
Precision@0.7
Precision@0.8
Precision@0.9

评测结果

各个模型在此基准测试上的表现结果

Paper TitleRepository
SOC (Video-Swin-B)0.5730.7250.8070.8510.8270.7650.6070.252SOC: Semantic-Assisted Object Cluster for Referring Video Object Segmentation
SgMg (Video-Swin-B)0.5850.7200.7990.8430.8220.7670.6170.259Spectrum-guided Multi-granularity Referring Video Object Segmentation
ReferFormer (Video-Swin-B)0.5500.7030.7860.8310.8040.7410.5790.212Language as Queries for Referring Video Object Segmentation
SOC (Video-Swin-T)0.5040.6690.7470.790.7560.6870.5350.195SOC: Semantic-Assisted Object Cluster for Referring Video Object Segmentation
ClawCraneNet-0.6550.6440.7040.6770.6170.4890.171ClawCraneNet: Leveraging Object-level Relation for Text-based Video Segmentation-
MTTR (w=10)0.4610.640.720.7540.7120.6380.4850.169End-to-End Referring Video Object Segmentation with Multimodal Transformers
MANET0.4710.6320.7260.7340.6820.5790.3890.132Multi-Attention Network for Compressed Video Referring Object Segmentation
MTTR (w=8)0.4470.6180.7020.7210.6840.6070.4560.164End-to-End Referring Video Object Segmentation with Multimodal Transformers
RefVOS-0.5990.5990.495---0.064RefVOS: A Closer Look at Referring Expressions for Video Object Segmentation
VLIDE0.4690.5980.7140.7020.6630.5850.4280.151Deeply Interleaved Two-Stream Encoder for Referring Video Segmentation-
Locater0.4650.5970.690.7090.640.5250.3510.101Local-Global Context Aware Transformer for Language-Guided Video Segmentation
CMPC-V (I3D)0.4040.5730.6530.6550.5920.5060.3420.098Cross-Modal Progressive Comprehension for Referring Segmentation
Hui et al.0.3990.5610.6620.6540.5890.4970.3330.091Collaborative Spatial-Temporal Modeling for Language-Queried Video Actor Segmentation-
mmmmtbvs0.4190.5580.6730.6450.5970.5230.3750.13Modeling Motion with Multi-Modal Features for Text-Based Video Segmentation
AAMN0.3960.5520.6170.6810.6290.5230.2960.029Actor and Action Modular Network for Text-based Video Segmentation-
CMDy0.3330.5310.6230.6070.5250.4050.2350.045Context Modulated Dynamic Networks for Actor and Action Video Segmentation with Language Queries-
PRPE0.3880.5290.6610.6340.5790.4830.3220.083Polar Relative Positional Encoding for Video-Language Segmentation-
HINet-0.5290.6790.6110.5590.4860.3420.12Hierarchical interaction network for video object segmentation from referring expressions-
CMPC-V (R2D)0.3510.5150.6490.5900.5270.4340.2840.068Cross-Modal Progressive Comprehension for Referring Segmentation
RefVOS-0.4970.6720.5780.5340.4560.3110.093Hierarchical interaction network for video object segmentation from referring expressions-
0 of 27 row(s) selected.
Referring Expression Segmentation On A2D | SOTA | HyperAI超神经