Learning Latent Architectural Distribution in Differentiable Neural Architecture Search via Variational Information Maximization
Experimental results:
Existing differentiable neural architecture search approaches simply assume the architectural distribution on each edge is independent of each other, which conflicts with the intrinsic properties of architecture. In this paper, we view the architectural distribution as the latent representation of specific data points. Then we propose Variational Information Maximization Neural Architecture Search (VIM-NAS) to leverage a simple yet effective convolutional neural network to model the latent representation, and optimize for a tractable variational lower bound to the mutual information between the data points and the latent representations. VIM-NAS automatically learns a nearly one-hot distribution from a continuous distribution with extremely fast convergence speed, e.g., converging with one epoch.
Yaoming Wang1, Yuchen Liu1, Wenrui Dai2, Chenglin Li1
Junni Zou2, Hongkai Xiong1
1Department of Electronic Engineering, Shanghai Jiao Tong University, China
2Department of Computer Science & Engineering, Shanghai Jiao Tong University, China
Conclusion
Shanghai Jiao Tong University�
Abstract