HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

DPT: Deformable Patch-based Transformer for Visual Recognition

Zhiyang Chen Yousong Zhu Chaoyang Zhao Guosheng Hu Wei Zeng Jinqiao Wang Ming Tang

DPT: Deformable Patch-based Transformer for Visual Recognition

Abstract

Transformer has achieved great success in computer vision, while how to split patches in an image remains a problem. Existing methods usually use a fixed-size patch embedding which might destroy the semantics of objects. To address this problem, we propose a new Deformable Patch (DePatch) module which learns to adaptively split the images into patches with different positions and scales in a data-driven way rather than using predefined fixed patches. In this way, our method can well preserve the semantics in patches. The DePatch module can work as a plug-and-play module, which can easily be incorporated into different transformers to achieve an end-to-end training. We term this DePatch-embedded transformer as Deformable Patch-based Transformer (DPT) and conduct extensive evaluations of DPT on image classification and object detection. Results show DPT can achieve 81.9% top-1 accuracy on ImageNet classification, and 43.7% box mAP with RetinaNet, 44.3% with Mask R-CNN on MSCOCO object detection. Code has been made available at: https://github.com/CASIA-IVA-Lab/DPT .

Code Repositories

CASIA-IVA-Lab/DPT
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
semantic-segmentation-on-densepassDPT (MiT-B1)
mIoU: 36.50%

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
DPT: Deformable Patch-based Transformer for Visual Recognition | Papers | HyperAI