3 months ago

SynthRef: Generation of Synthetic Referring Expressions for Object Segmentation

Ioannis Kazakos Carles Ventura Miriam Bellver Carina Silberer Xavier Giro-i-Nieto

Abstract

Recent advances in deep learning have brought significant progress in visual grounding tasks such as language-guided video object segmentation. However, collecting large datasets for these tasks is expensive in terms of annotation time, which represents a bottleneck. To this end, we propose a novel method, namely SynthRef, for generating synthetic referring expressions for target objects in an image (or video frame), and we also present and disseminate the first large-scale dataset with synthetic referring expressions for video object segmentation. Our experiments demonstrate that by training with our synthetic referring expressions one can improve the ability of a model to generalize across different datasets, without any additional annotation cost. Moreover, our formulation allows its application to any object detection or segmentation dataset.

Code Repositories

imatge-upc/synthref

Official

pytorch

miriambellver/refvos

pytorch

Benchmarks

Benchmark	Methodology	Metrics
referring-expression-segmentation-on-davis	RefVOS	Ju0026F 1st frame: 45.1
referring-expression-segmentation-on-davis	RefVOS + SynthRef-YouTube-VIS	Ju0026F 1st frame: 45.3 Ju0026F Full video: 44.8
referring-expression-segmentation-on-refer	RefVOS-Human REs	Mean IoU: 39.5 Precision@0.5: 38.6 Precision@0.9: 6.9
referring-expression-segmentation-on-refer	RefVOS-Synthetic REs	Mean IoU: 35.0 Precision@0.5: 32.3 Precision@0.9: 1.8

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette