HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

SeqTR: A Simple yet Universal Network for Visual Grounding

Chaoyang Zhu Yiyi Zhou Yunhang Shen Gen Luo Xingjia Pan Mingbao Lin Chao Chen Liujuan Cao Xiaoshuai Sun Rongrong Ji

SeqTR: A Simple yet Universal Network for Visual Grounding

Abstract

In this paper, we propose a simple yet universal network termed SeqTR for visual grounding tasks, e.g., phrase localization, referring expression comprehension (REC) and segmentation (RES). The canonical paradigms for visual grounding often require substantial expertise in designing network architectures and loss functions, making them hard to generalize across tasks. To simplify and unify the modeling, we cast visual grounding as a point prediction problem conditioned on image and text inputs, where either the bounding box or binary mask is represented as a sequence of discrete coordinate tokens. Under this paradigm, visual grounding tasks are unified in our SeqTR network without task-specific branches or heads, e.g., the convolutional mask decoder for RES, which greatly reduces the complexity of multi-task modeling. In addition, SeqTR also shares the same optimization objective for all tasks with a simple cross-entropy loss, further reducing the complexity of deploying hand-crafted loss functions. Experiments on five benchmark datasets demonstrate that the proposed SeqTR outperforms (or is on par with) the existing state-of-the-arts, proving that a simple yet universal approach for visual grounding is indeed feasible. Source code is available at https://github.com/sean-zhuh/SeqTR.

Code Repositories

luogen1996/simrec
pytorch
Mentioned in GitHub
sean-zhuh/seqtr
Official
pytorch
Mentioned in GitHub
seanzhuh/seqtr
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
referring-expression-segmentation-on-refcoco-8SeqTR
Overall IoU: 69.79
referring-expression-segmentation-on-refcoco-9SeqTR
Overall IoU: 64.12

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
SeqTR: A Simple yet Universal Network for Visual Grounding | Papers | HyperAI