Command Palette
Search for a command to run...
Minhyeok Lee; Suhwan Cho; Dogyoon Lee; Chaewon Park; Jungho Lee; Sangyoun Lee

Abstract
Unsupervised video object segmentation aims to segment the most prominent object in a video sequence. However, the existence of complex backgrounds and multiple foreground objects make this task challenging. To address this issue, we propose a guided slot attention network to reinforce spatial structural information and obtain better foreground--background separation. The foreground and background slots, which are initialized with query guidance, are iteratively refined based on interactions with template information. Furthermore, to improve slot--template interaction and effectively fuse global and local features in the target and reference frames, K-nearest neighbors filtering and a feature aggregation transformer are introduced. The proposed model achieves state-of-the-art performance on two popular datasets. Additionally, we demonstrate the robustness of the proposed model in challenging scenes through various comparative experiments.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| unsupervised-video-object-segmentation-on-10 | GSANet | F: 89.6 G: 88.9 J: 88.3 |
| unsupervised-video-object-segmentation-on-11 | GSANet | J: 83.1 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.