Command Palette
Search for a command to run...
Learning Motion-Appearance Co-Attention for Zero-Shot Video Object Segmentation
{Xiaoxing Zhang Shuo Wang Huchuan Lu Jinqing Qi Lu Zhang Shu Yang}

Abstract
How to make the appearance and motion information interact effectively to accommodate complex scenarios is a fundamental issue in flow-based zero-shot video object segmentation. In this paper, we propose an Attentive Multi-Modality Collaboration Network (AMC-Net) to utilize appearance and motion information uniformly. Specifically, AMC-Net fuses robust information from multi-modality features and promotes their collaboration in two stages. First, we propose a Multi-Modality Co-Attention Gate (MCG) on the bilateral encoder branches, in which a gate function is used to formulate co-attention scores for balancing the contributions of multi-modality features and suppressing the redundant and misleading information. Then, we propose a Motion Correction Module (MCM) with a visual-motion attention mechanism, which is constructed to emphasize the features of foreground objects by incorporating the spatio-temporal correspondence between appearance and motion cues. Extensive experiments on three public challenging benchmark datasets verify that our proposed network performs favorably against existing state-of-the-art methods via training with fewer data.
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| unsupervised-video-object-segmentation-on-10 | AMC-Net | F: 84.6 G: 84.6 J: 84.5 |
| unsupervised-video-object-segmentation-on-11 | AMC-Net | J: 76.5 |
| unsupervised-video-object-segmentation-on-12 | AMC-Net | J: 71.1 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.