1 of 1

Learning Latent Architectural Distribution in Differentiable Neural Architecture Search via Variational Information Maximization

 

Experimental results:

    • State-of-the-art performance with a top-1 error rate of 2.45% and 15.80% on CIFAR10 and CIFAR100, respectively, within 10 minutes searching.

    • Direct search on ImageNet with extremely fast 0.26 GPU-days and achieve state-of-the-art performance with a top-1 error rate of 23.8% and a top-5 error rate of 7.1%.

    • State-of-the-art performance on various NAS search spaces, including NAS-Bench-1shot1, NAS-Bench-201, and simplified search spaces S1-S4.

Existing differentiable neural architecture search approaches simply assume the architectural distribution on each edge is independent of each other, which conflicts with the intrinsic properties of architecture. In this paper, we view the architectural distribution as the latent representation of specific data points. Then we propose Variational Information Maximization Neural Architecture Search (VIM-NAS) to leverage a simple yet effective convolutional neural network to model the latent representation, and optimize for a tractable variational lower bound to the mutual information between the data points and the latent representations. VIM-NAS automatically learns a nearly one-hot distribution from a continuous distribution with extremely fast convergence speed, e.g., converging with one epoch.

Yaoming Wang1, Yuchen Liu1, Wenrui Dai2, Chenglin Li1

Junni Zou2, Hongkai Xiong1

1Department of Electronic Engineering, Shanghai Jiao Tong University, China

2Department of Computer Science & Engineering, Shanghai Jiao Tong University, China

Conclusion

    • We provide new insights into NAS that the architectural distribution is the latent representation of a given dataset and leverage a simple yet effective convolutional neural network to model the dependencies among architectural distribution. Moreover, we propose a novel search strategy to maximize the variational lower bound to the mutual information between the data points and the latent architectural representations.

    • Experimental results demonstrate VIM-NAS exhibits extremely fast convergence speed within one epoch, and achieves state-of-the-art performance on various search spaces

Shanghai Jiao Tong University

Abstract