HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

Temporally Consistent Referring Video Object Segmentation with Hybrid Memory

Bo Miao; Mohammed Bennamoun; Yongsheng Gao; Mubarak Shah; Ajmal Mian

Temporally Consistent Referring Video Object Segmentation with Hybrid Memory

Abstract

Referring Video Object Segmentation (R-VOS) methods face challenges in maintaining consistent object segmentation due to temporal context variability and the presence of other visually similar objects. We propose an end-to-end R-VOS paradigm that explicitly models temporal instance consistency alongside the referring segmentation. Specifically, we introduce a novel hybrid memory that facilitates inter-frame collaboration for robust spatio-temporal matching and propagation. Features of frames with automatically generated high-quality reference masks are propagated to segment the remaining frames based on multi-granularity association to achieve temporally consistent R-VOS. Furthermore, we propose a new Mask Consistency Score (MCS) metric to evaluate the temporal consistency of video segmentation. Extensive experiments demonstrate that our approach enhances temporal consistency by a significant margin, leading to top-ranked performance on popular R-VOS benchmarks, i.e., Ref-YouTube-VOS (67.1%) and Ref-DAVIS17 (65.6%). The code is available at https://github.com/bo-miao/HTR.

Code Repositories

bo-miao/HTR
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
referring-expression-segmentation-on-davisHTR
Ju0026F 1st frame: 65.6
referring-expression-segmentation-on-refer-1HTR (Pre-training)
F: 68.9
J: 65.3
Ju0026F: 67.1
referring-video-object-segmentation-on-mevisHTR
F: 45.5
J: 39.9
Ju0026F: 42.7
referring-video-object-segmentation-on-referHTR
F: 68.9
J: 65.3
Ju0026F: 67.1

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Temporally Consistent Referring Video Object Segmentation with Hybrid Memory | Papers | HyperAI