HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

ViDT: An Efficient and Effective Fully Transformer-based Object Detector

Hwanjun Song Deqing Sun Sanghyuk Chun Varun Jampani Dongyoon Han Byeongho Heo Wonjae Kim Ming-Hsuan Yang

ViDT: An Efficient and Effective Fully Transformer-based Object Detector

Abstract

Transformers are transforming the landscape of computer vision, especially for recognition tasks. Detection transformers are the first fully end-to-end learning systems for object detection, while vision transformers are the first fully transformer-based architecture for image classification. In this paper, we integrate Vision and Detection Transformers (ViDT) to build an effective and efficient object detector. ViDT introduces a reconfigured attention module to extend the recent Swin Transformer to be a standalone object detector, followed by a computationally efficient transformer decoder that exploits multi-scale features and auxiliary techniques essential to boost the detection performance without much increase in computational load. Extensive evaluation results on the Microsoft COCO benchmark dataset demonstrate that ViDT obtains the best AP and latency trade-off among existing fully transformer-based object detectors, and achieves 49.2AP owing to its high scalability for large models. We will release the code and trained models at https://github.com/naver-ai/vidt

Code Repositories

naver-ai/vidt
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
object-detection-on-coco-2017-valViDT Swin-base
AP: 49.2
AP50: 69.4
AP75: 53.1
APL: 66.9
APM: 52.6
APS: 30.6
Param.: 0.1B
object-detection-on-coco-2017-valViDT Swin-small
AP: 47.5
AP50: 67.7
AP75: 51.4
APL: 64.8
APM: 50.7
APS: 29.2
Param.: 61M
object-detection-on-coco-2017-valViDT Swin-nano
AP: 40.4
AP50: 59.6
AP75: 43.3
APL: 55.8
APM: 42.5
APS: 23.2
Param.: 16M
object-detection-on-coco-2017-valViDT Swin-tiny
AP: 44.8
AP50: 64.5
AP75: 48.7
APL: 62.1
APM: 47.6
APS: 25.9
Param.: 38M

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
ViDT: An Efficient and Effective Fully Transformer-based Object Detector | Papers | HyperAI