HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

GRiT: A Generative Region-to-text Transformer for Object Understanding

Jialian Wu Jianfeng Wang Zhengyuan Yang Zhe Gan Zicheng Liu Junsong Yuan Lijuan Wang

GRiT: A Generative Region-to-text Transformer for Object Understanding

Abstract

This paper presents a Generative RegIon-to-Text transformer, GRiT, for object understanding. The spirit of GRiT is to formulate object understanding as <region, text> pairs, where region locates objects and text describes objects. For example, the text in object detection denotes class names while that in dense captioning refers to descriptive sentences. Specifically, GRiT consists of a visual encoder to extract image features, a foreground object extractor to localize objects, and a text decoder to generate open-set object descriptions. With the same model architecture, GRiT can understand objects via not only simple nouns, but also rich descriptive sentences including object attributes or actions. Experimentally, we apply GRiT to object detection and dense captioning tasks. GRiT achieves 60.4 AP on COCO 2017 test-dev for object detection and 15.5 mAP on Visual Genome for dense captioning. Code is available at https://github.com/JialianW/GRiT

Code Repositories

JialianW/GRiT
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
dense-captioning-on-visual-genomeGRiT (ViT-B)
mAP: 15.5
object-detection-on-cocoGRiT (ViT-H, single-scale testing)
box mAP: 60.4
object-detection-on-coco-oGRiT (ViT-H)
Average mAP: 42.9
Effective Robustness: 15.72

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
GRiT: A Generative Region-to-text Transformer for Object Understanding | Papers | HyperAI