HyperAIHyperAI

Command Palette

Search for a command to run...

4 months ago

Multi-Context Temporal Consistent Modeling for Referring Video Object Segmentation

Sun-Hyuk Choi; Hayoung Jo; Seong-Whan Lee

Multi-Context Temporal Consistent Modeling for Referring Video Object Segmentation

Abstract

Referring video object segmentation aims to segment objects within a video corresponding to a given text description. Existing transformer-based temporal modeling approaches face challenges related to query inconsistency and the limited consideration of context. Query inconsistency produces unstable masks of different objects in the middle of the video. The limited consideration of context leads to the segmentation of incorrect objects by failing to adequately account for the relationship between the given text and instances. To address these issues, we propose the Multi-context Temporal Consistency Module (MTCM), which consists of an Aligner and a Multi-Context Enhancer (MCE). The Aligner removes noise from queries and aligns them to achieve query consistency. The MCE predicts text-relevant queries by considering multi-context. We applied MTCM to four different models, increasing performance across all of them, particularly achieving 47.6 J&F on the MeViS. Code is available at https://github.com/Choi58/MTCM.

Code Repositories

choi58/mtcm
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
referring-video-object-segmentation-on-mevisDsHmp + MTCM
F: 51.1
J: 44.1
Ju0026F: 47.6

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Multi-Context Temporal Consistent Modeling for Referring Video Object Segmentation | Papers | HyperAI