HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Conviformers: Convolutionally guided Vision Transformer

Mohit Vaishnav Thomas Fel Ivań Felipe Rodríguez Thomas Serre

Conviformers: Convolutionally guided Vision Transformer

Abstract

Vision transformers are nowadays the de-facto choice for image classification tasks. There are two broad categories of classification tasks, fine-grained and coarse-grained. In fine-grained classification, the necessity is to discover subtle differences due to the high level of similarity between sub-classes. Such distinctions are often lost as we downscale the image to save the memory and computational cost associated with vision transformers (ViT). In this work, we present an in-depth analysis and describe the critical components for developing a system for the fine-grained categorization of plants from herbarium sheets. Our extensive experimental analysis indicated the need for a better augmentation technique and the ability of modern-day neural networks to handle higher dimensional images. We also introduce a convolutional transformer architecture called Conviformer which, unlike the popular Vision Transformer (ConViT), can handle higher resolution images without exploding memory and computational cost. We also introduce a novel, improved pre-processing technique called PreSizer to resize images better while preserving their original aspect ratios, which proved essential for classifying natural plants. With our simple yet effective approach, we achieved SoTA on Herbarium 202x and iNaturalist 2019 dataset.

Code Repositories

vaishnavmohit/Conviformer
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
fine-grained-image-classification-on-4Conviformer-B
Test F1 score: .719
fine-grained-image-classification-on-5Conviformer-B
Test F1 score (private): .868
image-classification-on-inaturalist-2019Conviformer-B
Top-1 Accuracy: 82.85

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Conviformers: Convolutionally guided Vision Transformer | Papers | HyperAI