HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

Parameter Prediction for Unseen Deep Architectures

Boris Knyazev; Michal Drozdzal; Graham W. Taylor; Adriana Romero-Soriano

Parameter Prediction for Unseen Deep Architectures

Abstract

Deep learning has been successful in automating the design of features in machine learning pipelines. However, the algorithms optimizing neural network parameters remain largely hand-designed and computationally inefficient. We study if we can use deep learning to directly predict these parameters by exploiting the past knowledge of training other networks. We introduce a large-scale dataset of diverse computational graphs of neural architectures - DeepNets-1M - and use it to explore parameter prediction on CIFAR-10 and ImageNet. By leveraging advances in graph neural networks, we propose a hypernetwork that can predict performant parameters in a single forward pass taking a fraction of a second, even on a CPU. The proposed model achieves surprisingly good performance on unseen and diverse networks. For example, it is able to predict all 24 million parameters of a ResNet-50 achieving a 60% accuracy on CIFAR-10. On ImageNet, top-5 accuracy of some of our networks approaches 50%. Our task along with the model and results can potentially lead to a new, more computationally efficient paradigm of training networks. Our model also learns a strong representation of neural architectures enabling their analysis.

Code Repositories

facebookresearch/ppuda
Official
pytorch

Benchmarks

BenchmarkMethodologyMetrics
parameter-prediction-on-cifar10GHN-2
Classification Accuracy (BN-free): 36.8
Classification Accuracy (Deep): 60.5
Classification Accuracy (Dense): 65.8
Classification Accuracy (ID-test): 66.9
Classification Accuracy (ResNet-50): 58.6
Classification Accuracy (ViT): 11.4
Classification Accuracy (Wide): 64

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Parameter Prediction for Unseen Deep Architectures | Papers | HyperAI