HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

BATMAN: Bilateral Attention Transformer in Motion-Appearance Neighboring Space for Video Object Segmentation

Ye Yu; Jialin Yuan; Gaurav Mittal; Li Fuxin; Mei Chen

BATMAN: Bilateral Attention Transformer in Motion-Appearance Neighboring Space for Video Object Segmentation

Abstract

Video Object Segmentation (VOS) is fundamental to video understanding. Transformer-based methods show significant performance improvement on semi-supervised VOS. However, existing work faces challenges segmenting visually similar objects in close proximity of each other. In this paper, we propose a novel Bilateral Attention Transformer in Motion-Appearance Neighboring space (BATMAN) for semi-supervised VOS. It captures object motion in the video via a novel optical flow calibration module that fuses the segmentation mask with optical flow estimation to improve within-object optical flow smoothness and reduce noise at object boundaries. This calibrated optical flow is then employed in our novel bilateral attention, which computes the correspondence between the query and reference frames in the neighboring bilateral space considering both motion and appearance. Extensive experiments validate the effectiveness of BATMAN architecture by outperforming all existing state-of-the-art on all four popular VOS benchmarks: Youtube-VOS 2019 (85.0%), Youtube-VOS 2018 (85.3%), DAVIS 2017Val/Testdev (86.2%/82.2%), and DAVIS 2016 (92.5%).

Benchmarks

BenchmarkMethodologyMetrics
video-object-segmentation-on-davis-2016KMN (val)
F-Score: 91.5
Ju0026F: 90.5
Jaccard (Mean): 89.5
video-object-segmentation-on-davis-2016AOT (val)
F-Score: 92.1
Ju0026F: 91.1
Jaccard (Mean): 90.1
video-object-segmentation-on-davis-2016RMN (val)
F-Score: 88.7
Ju0026F: 88.8
Jaccard (Mean): 88.9
video-object-segmentation-on-davis-2016CFBI (val)
F-Score: 90.5
Ju0026F: 89.4
Jaccard (Mean): 88.3
video-object-segmentation-on-davis-2016STCN (val)
F-Score: 92.5
Ju0026F: 91.6
Jaccard (Mean): 90.8
video-object-segmentation-on-davis-2016STM (val)
F-Score: 89.9
Jaccard (Mean): 88.7
video-object-segmentation-on-davis-2016LCM (val)
F-Score: 91.4
Ju0026F: 90.7
Jaccard (Mean): 89.9
video-object-segmentation-on-davis-2016CFBI+ (val)
F-Score: 91.1
Ju0026F: 89.9
Jaccard (Mean): 88.7
video-object-segmentation-on-davis-2016TransVOS (val)
F-Score: 91.2
Ju0026F: 90.5
Jaccard (Mean): 89.8
video-object-segmentation-on-davis-2016BATMAN (val)
F-Score: 94.2
Ju0026F: 92.5
Jaccard (Mean): 90.7
video-object-segmentation-on-davis-2016RPCMVOS (val)
F-Score: 94
Ju0026F: 90.6
Jaccard (Mean): 87.1
video-object-segmentation-on-davis-2017-test-1RMN
F-measure: 78.1
Jaccard: 71.9
video-object-segmentation-on-davis-2017-test-1TransVOS
F-measure: 80.9
Jaccard: 73
Mean Jaccard u0026 F-Measure: 76.9
video-object-segmentation-on-davis-2017-test-1CFBI+
Jaccard: 71.6
Mean Jaccard u0026 F-Measure: 75.6
video-object-segmentation-on-davis-2017-test-1BATMAN
F-measure: 86.1
Jaccard: 78.4
Mean Jaccard u0026 F-Measure: 82.2
video-object-segmentation-on-davis-2017-test-1CFBI
F-measure: 78.7
Jaccard: 71.4
Mean Jaccard u0026 F-Measure: 75
video-object-segmentation-on-davis-2017-test-1LCM
F-measure: 81.8
Jaccard: 74.4
Mean Jaccard u0026 F-Measure: 78.1
video-object-segmentation-on-davis-2017-test-1STCN
F-measure: 79.6
Jaccard: 72.7
Mean Jaccard u0026 F-Measure: 76.1
video-object-segmentation-on-davis-2017-test-1KMN
F-measure: 80.3
Jaccard: 74.1
Mean Jaccard u0026 F-Measure: 77.2
video-object-segmentation-on-davis-2017-valTransVOS
F-measure: 86.4
Jaccard: 81.4
Mean Jaccard u0026 F-Measure: 83.9
video-object-segmentation-on-davis-2017-valAOT
F-measure: 87.5
Jaccard: 82.3
Mean Jaccard u0026 F-Measure: 84.9
video-object-segmentation-on-davis-2017-valCFBI
F-measure: 84.5
Jaccard: 79.3
Mean Jaccard u0026 F-Measure: 81.9
video-object-segmentation-on-davis-2017-valRMN
F-measure: 86
Jaccard: 81
Mean Jaccard u0026 F-Measure: 83.5
video-object-segmentation-on-davis-2017-valSTM
F-measure: 84.3
Jaccard: 79.2
video-object-segmentation-on-davis-2017-valLWL
F-measure: 84.1
Jaccard: 79.1
Mean Jaccard u0026 F-Measure: 81.6
video-object-segmentation-on-davis-2017-valBATMAN
F-measure: 89.3
Mean Jaccard u0026 F-Measure: 86.2
video-object-segmentation-on-davis-2017-valSST
F-measure: 85.1
Jaccard: 79.9
Mean Jaccard u0026 F-Measure: 82.5
video-object-segmentation-on-davis-2017-valCFBI+
F-measure: 85.7
Jaccard: 80.1
Mean Jaccard u0026 F-Measure: 82.9
video-object-segmentation-on-davis-2017-valSTCN
F-measure: 88.6
Jaccard: 82.2
Mean Jaccard u0026 F-Measure: 85.4
video-object-segmentation-on-davis-2017-valLCM
F-measure: 86.5
Jaccard: 80.5
video-object-segmentation-on-davis-2017-valRPCMVOS
Jaccard: 81.3
Mean Jaccard u0026 F-Measure: 83.7
video-object-segmentation-on-davis-2017-valKMN
F-measure: 85.6
Jaccard: 80
Mean Jaccard u0026 F-Measure: 82.8
video-object-segmentation-on-davis-2017-valAFB-URR
F-measure: 76.1
Jaccard: 73
Mean Jaccard u0026 F-Measure: 74.6
video-object-segmentation-on-youtube-vos-1SST
Jaccard (Seen): 81.2
Jaccard (Unseen): 76
Mean Jaccard u0026 F-Measure: 81.7
video-object-segmentation-on-youtube-vos-1AFB-URR
F-Measure (Seen): 83.1
F-Measure (Unseen): 82.6
Jaccard (Seen): 78.8
Jaccard (Unseen): 74.1
Mean Jaccard u0026 F-Measure: 79.6
video-object-segmentation-on-youtube-vos-1KMN
F-Measure (Seen): 85.6
F-Measure (Unseen): 83.3
Jaccard (Seen): 81.4
Jaccard (Unseen): 75.3
Mean Jaccard u0026 F-Measure: 81.4
video-object-segmentation-on-youtube-vos-1TransVOS
F-Measure (Seen): 86.7
F-Measure (Unseen): 83.4
Jaccard (Seen): 82
Jaccard (Unseen): 75
Mean Jaccard u0026 F-Measure: 81.8
video-object-segmentation-on-youtube-vos-1LWL
F-Measure (Seen): 84.9
F-Measure (Unseen): 84.4
Jaccard (Seen): 80.4
Jaccard (Unseen): 76.4
Mean Jaccard u0026 F-Measure: 81.5
video-object-segmentation-on-youtube-vos-1RPCMVOS
F-Measure (Seen): 87.7
F-Measure (Unseen): 86.7
Jaccard (Seen): 83.1
Jaccard (Unseen): 78.5
Mean Jaccard u0026 F-Measure: 84
video-object-segmentation-on-youtube-vos-1AOT
F-Measure (Seen): 88.5
F-Measure (Unseen): 86.1
Jaccard (Seen): 83.7
Jaccard (Unseen): 78.1
Mean Jaccard u0026 F-Measure: 84.1
video-object-segmentation-on-youtube-vos-1STCN
F-Measure (Seen): 86.5
F-Measure (Unseen): 85.7
Jaccard (Seen): 81.9
Jaccard (Unseen): 77.9
Mean Jaccard u0026 F-Measure: 83
video-object-segmentation-on-youtube-vos-1RMN
F-Measure (Seen): 85.7
F-Measure (Unseen): 82.4
Jaccard (Seen): 82.1
Jaccard (Unseen): 75.7
video-object-segmentation-on-youtube-vos-1CFBI
F-Measure (Seen): 85.8
Jaccard (Seen): 81.1
video-object-segmentation-on-youtube-vos-1CFBI+
F-Measure (Seen): 86.6
F-Measure (Unseen): 85.6
Jaccard (Seen): 81.8
Jaccard (Unseen): 77.1
Mean Jaccard u0026 F-Measure: 82.8
video-object-segmentation-on-youtube-vos-1STM
F-Measure (Seen): 84.2
F-Measure (Unseen): 80.9
Jaccard (Seen): 79.7
Jaccard (Unseen): 72.8
Mean Jaccard u0026 F-Measure: 79.4
video-object-segmentation-on-youtube-vos-1LCM
Jaccard (Seen): 82.2
Mean Jaccard u0026 F-Measure: 82
video-object-segmentation-on-youtube-vos-2019-2BATMAN
F-Measure (Seen): 89.3
F-Measure (Unseen): 87.2
Jaccard (Seen): 84.5
Jaccard (Unseen): 79
Mean Jaccard u0026 F-Measure: 85
video-object-segmentation-on-youtube-vos-2019-2CFBI
F-Measure (Seen): 85.1
F-Measure (Unseen): 83
Jaccard (Seen): 80.6
Jaccard (Unseen): 75.2
Mean Jaccard u0026 F-Measure: 81
visual-object-tracking-on-youtube-vosTransVOS
F-Measure (Seen): 86.7
F-Measure (Unseen): 83.4
visual-object-tracking-on-youtube-vosKMN
Jaccard (Unseen): 75.3
visual-object-tracking-on-youtube-vosRMN
Jaccard (Unseen): 75.7
visual-object-tracking-on-youtube-vosCFBI
F-Measure (Unseen): 83.4

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
BATMAN: Bilateral Attention Transformer in Motion-Appearance Neighboring Space for Video Object Segmentation | Papers | HyperAI