3 months ago

HyT-NAS: Hybrid Transformers Neural Architecture Search for Edge Devices

Lotfi Abdelkrim Mecharbat Hadjer Benmeziane Hamza Ouarnoughi Smail Niar

Abstract

Vision Transformers have enabled recent attention-based Deep Learning (DL) architectures to achieve remarkable results in Computer Vision (CV) tasks. However, due to the extensive computational resources required, these architectures are rarely implemented on resource-constrained platforms. Current research investigates hybrid handcrafted convolution-based and attention-based models for CV tasks such as image classification and object detection. In this paper, we propose HyT-NAS, an efficient Hardware-aware Neural Architecture Search (HW-NAS) including hybrid architectures targeting vision tasks on tiny devices. HyT-NAS improves state-of-the-art HW-NAS by enriching the search space and enhancing the search strategy as well as the performance predictors. Our experiments show that HyT-NAS achieves a similar hypervolume with less than ~5x training evaluations. Our resulting architecture outperforms MLPerf MobileNetV1 by 6.3% accuracy improvement with 3.5x less number of parameters on Visual Wake Words.

Benchmarks

Benchmark	Methodology	Metrics
image-classification-on-visual-wake-words	ProxylessNAS	Accuracy: 86.55
image-classification-on-visual-wake-words	MobileNetV1	Accuracy: 83.7
image-classification-on-visual-wake-words	HyT-NAS-BA	Accuracy: 92.25
image-classification-on-visual-wake-words	MobileNetV2 (x0.35)	Accuracy: 86.34

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning