Command Palette
Search for a command to run...
HyT-NAS: Hybrid Transformers Neural Architecture Search for Edge Devices
Lotfi Abdelkrim Mecharbat Hadjer Benmeziane Hamza Ouarnoughi Smail Niar

Abstract
Vision Transformers have enabled recent attention-based Deep Learning (DL) architectures to achieve remarkable results in Computer Vision (CV) tasks. However, due to the extensive computational resources required, these architectures are rarely implemented on resource-constrained platforms. Current research investigates hybrid handcrafted convolution-based and attention-based models for CV tasks such as image classification and object detection. In this paper, we propose HyT-NAS, an efficient Hardware-aware Neural Architecture Search (HW-NAS) including hybrid architectures targeting vision tasks on tiny devices. HyT-NAS improves state-of-the-art HW-NAS by enriching the search space and enhancing the search strategy as well as the performance predictors. Our experiments show that HyT-NAS achieves a similar hypervolume with less than ~5x training evaluations. Our resulting architecture outperforms MLPerf MobileNetV1 by 6.3% accuracy improvement with 3.5x less number of parameters on Visual Wake Words.
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| image-classification-on-visual-wake-words | ProxylessNAS | Accuracy: 86.55 |
| image-classification-on-visual-wake-words | MobileNetV1 | Accuracy: 83.7 |
| image-classification-on-visual-wake-words | HyT-NAS-BA | Accuracy: 92.25 |
| image-classification-on-visual-wake-words | MobileNetV2 (x0.35) | Accuracy: 86.34 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.