HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles

Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles

Abstract

Modern hierarchical vision transformers have added several vision-specificcomponents in the pursuit of supervised classification performance. While thesecomponents lead to effective accuracies and attractive FLOP counts, the addedcomplexity actually makes these transformers slower than their vanilla ViTcounterparts. In this paper, we argue that this additional bulk is unnecessary.By pretraining with a strong visual pretext task (MAE), we can strip out allthe bells-and-whistles from a state-of-the-art multi-stage vision transformerwithout losing accuracy. In the process, we create Hiera, an extremely simplehierarchical vision transformer that is more accurate than previous modelswhile being significantly faster both at inference and during training. Weevaluate Hiera on a variety of tasks for image and video recognition. Our codeand models are available at https://github.com/facebookresearch/hiera.

Benchmarks

BenchmarkMethodologyMetrics
action-classification-on-kinetics-400Hiera-H (no extra data)
Acc@1: 87.8
action-classification-on-kinetics-600Hiera-H (no extra data)
Top-1 Accuracy: 88.8
action-classification-on-kinetics-700Hiera-H (no extra data)
Top-1 Accuracy: 81.1
action-recognition-in-videos-on-somethingHiera-L (no extra data)
Top-1 Accuracy: 76.5
action-recognition-on-ava-v2-2Hiera-H (K700 PT+FT)
mAP: 43.3
image-classification-on-imagenetHiera-H
Top 1 Accuracy: 86.9%
image-classification-on-inaturalistHiera-H (448px)
Top 1 Accuracy: 83.8
image-classification-on-inaturalist-2018Hiera-H (448px)
Top-1 Accuracy: 87.3%
image-classification-on-inaturalist-2019Hiera-H (448px)
Top-1 Accuracy: 88.5
image-classification-on-places365-standardHiera-H (448px)
Top 1 Accuracy: 60.6
instance-segmentation-on-coco-minivalHeira-L
mask AP: 48.6
object-detection-on-coco-minivalHiera-L
box AP: 55

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles | Papers | HyperAI