Command Palette
Search for a command to run...
{Kin-Man Lam Jianbing Shen Wenguan Wang Sanyuan Zhao Hongmei Song}

Abstract
This paper proposes a fast video salient object detection model, based on a novel recurrent network architecture, named Pyramid Dilated Bidirectional ConvLSTM (PDB-ConvLSTM). A Pyramid Dilated Convolution (PDC) module is first designed for simultaneously extracting spatial features at multiple scales. These spatial features are then concatenated and fed into an extended Deeper Bidirectional ConvLSTM (DB-ConvLSTM) to learn spatiotemporal information. Forward and backward ConvLSTM units are placed in two layers and connected in a cascaded way, encouraging information flow between the bi-directional streams and leading to deeper feature extraction. We further augment DB-ConvLSTM with a PDC-like structure, by adopting several dilated DB-ConvLSTMs to extract multi-scale spatiotemporal information. Extensive experimental results show that our method outperforms previous video saliency models in a large margin, with a real-time speed of 20 fps on a single GPU. With unsupervised video object segmentation as an example application, the proposed model (with a CRF-based post-process) achieves state-of-the-art results on two popular benchmarks, well demonstrating its superior performance and high applicability.
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| unsupervised-video-object-segmentation-on-10 | PDB | F: 74.5 G: 75.9 J: 77.2 |
| unsupervised-video-object-segmentation-on-11 | PDB | J: 74.0 |
| unsupervised-video-object-segmentation-on-12 | PDB | J: 65.5 |
| unsupervised-video-object-segmentation-on-4 | PDB | F-measure (Mean): 57.0 F-measure (Recall): 60.2 Ju0026F: 55.1 Jaccard (Mean): 53.2 Jaccard (Recall): 58.9 |
| unsupervised-video-object-segmentation-on-5 | PDB | F-measure (Decay): 3.7 F-measure (Mean): 43.0 F-measure (Recall): 44.6 Ju0026F: 40.4 Jaccard (Decay): 4.0 Jaccard (Mean): 37.7 Jaccard (Recall): 42.6 |
| video-salient-object-detection-on-davis-2016 | PDB | AVERAGE MAE: 0.028 MAX E-MEASURE: 0.951 S-Measure: 0.882 |
| video-salient-object-detection-on-davsod | PDB | Average MAE: 0.114 S-Measure: 0.706 max E-Measure: 0.749 max F-Measure: 0.591 |
| video-salient-object-detection-on-davsod-1 | PDB | Average MAE: 0.132 S-Measure: 0.649 max E-measure: 0.698 |
| video-salient-object-detection-on-davsod-2 | PDB | Average MAE: 0.107 S-Measure: 0.608 max E-measure: 0.678 |
| video-salient-object-detection-on-fbms-59 | PDB | AVERAGE MAE: 0.064 MAX F-MEASURE: 0.821 S-Measure: 0.851 |
| video-salient-object-detection-on-mcl | PDB | AVERAGE MAE: 0.021 MAX E-MEASURE: 0.911 S-Measure: 0.856 |
| video-salient-object-detection-on-segtrack-v2 | PDB | AVERAGE MAE: 0.024 S-Measure: 0.864 max E-measure: 0.935 |
| video-salient-object-detection-on-uvsd | PDB | Average MAE: 0.018 S-Measure: 0.901 max E-measure: 0.975 |
| video-salient-object-detection-on-visal | PDB | Average MAE: 0.032 S-Measure: 0.907 max E-measure: 0.846 |
| video-salient-object-detection-on-vos-t | PDB | Average MAE: 0.078 S-Measure: 0.818 max E-measure: 0.837 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.