Command Palette
Search for a command to run...
SynthRef: Generation of Synthetic Referring Expressions for Object Segmentation
Ioannis Kazakos Carles Ventura Miriam Bellver Carina Silberer Xavier Giro-i-Nieto

Abstract
Recent advances in deep learning have brought significant progress in visual grounding tasks such as language-guided video object segmentation. However, collecting large datasets for these tasks is expensive in terms of annotation time, which represents a bottleneck. To this end, we propose a novel method, namely SynthRef, for generating synthetic referring expressions for target objects in an image (or video frame), and we also present and disseminate the first large-scale dataset with synthetic referring expressions for video object segmentation. Our experiments demonstrate that by training with our synthetic referring expressions one can improve the ability of a model to generalize across different datasets, without any additional annotation cost. Moreover, our formulation allows its application to any object detection or segmentation dataset.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| referring-expression-segmentation-on-davis | RefVOS | Ju0026F 1st frame: 45.1 |
| referring-expression-segmentation-on-davis | RefVOS + SynthRef-YouTube-VIS | Ju0026F 1st frame: 45.3 Ju0026F Full video: 44.8 |
| referring-expression-segmentation-on-refer | RefVOS-Human REs | Mean IoU: 39.5 Precision@0.5: 38.6 Precision@0.9: 6.9 |
| referring-expression-segmentation-on-refer | RefVOS-Synthetic REs | Mean IoU: 35.0 Precision@0.5: 32.3 Precision@0.9: 1.8 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.