HyperAI

Abstract

We propose a unified referring video object segmentation network (URVOS). URVOS takes a video and a referring expression as inputs, and estimates the {object masks} referred by the given language expression in the whole video frames. Our algorithm addresses the challenging problem by performing language-based object segmentation and mask propagation jointly using a single deep neural network with a proper combination of two attention models. In addition, we construct the first large-scale referring video object segmentation dataset called Refer-Youtube-VOS. We evaluate our model on two benchmark datasets including ours and demonstrate the effectiveness of the proposed approach. The dataset is released at url{https://github.com/skynbe/Refer-Youtube-VOS}.

Abstract

Joon-Young Lee Seonguk Seo Bohyung Han

Abstract

Build AI with AI

HyperAI Newsletters

Joon-Young Lee Seonguk Seo Bohyung Han

Abstract

Build AI with AI

HyperAI Newsletters

Joon-Young Lee Seonguk Seo Bohyung Han

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

URVOS: Unified Referring Video Object Segmentation Network with a Large-Scale Benchmark

Joon-Young Lee Seonguk Seo Bohyung Han

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

URVOS: Unified Referring Video Object Segmentation Network with a Large-Scale Benchmark

Joon-Young Lee Seonguk Seo Bohyung Han

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

URVOS: Unified Referring Video Object Segmentation Network with a Large-Scale Benchmark

Joon-Young Lee Seonguk Seo Bohyung Han

Abstract

Build AI with AI

HyperAI Newsletters