跳到主要导航 跳到搜索 跳到主要内容

Variational HyperAdam: A Meta-Learning Approach to Network Training

  • Xi'an Jiaotong University
  • Guangdong Artificial Intelligence and Digital Economy Laboratory - Guangzhou

科研成果: 期刊稿件文章同行评审

14 引用 (Scopus)

摘要

Stochastic optimization algorithms have been popular for training deep neural networks. Recently, there emerges a new approach of learning-based optimizer, which has achieved promising performance for training neural networks. However, these black-box learning-based optimizers do not fully take advantage of the experience in human-designed optimizers and heavily rely on learning from meta-training tasks, therefore have limited generalization ability. In this paper, we propose a novel optimizer, dubbed as Variational HyperAdam, which is based on a parametric generalized Adam algorithm, i.e., HyperAdam, in a variational framework. With Variational HyperAdam as optimizer for training neural network, the parameter update vector of the neural network at each training step is considered as random variable, whose approximate posterior distribution given the training data and current network parameter vector is predicted by Variational HyperAdam. The parameter update vector for network training is sampled from this approximate posterior distribution. Specifically, in Variational HyperAdam, we design a learnable generalized Adam algorithm for estimating expectation, paired with a VarBlock for estimating the variance of the approximate posterior distribution of parameter update vector. The Variational HyperAdam is learned in a meta-learning approach with meta-training loss derived by variational inference. Experiments verify that the learned Variational HyperAdam achieved state-of-the-art network training performance for various types of networks on different datasets, such as multilayer perceptron, CNN, LSTM and ResNet.

源语言英语
页(从-至)4469-4484
页数16
期刊IEEE Transactions on Pattern Analysis and Machine Intelligence
44
8
DOI
出版状态已出版 - 1 8月 2022

学术指纹

探究 'Variational HyperAdam: A Meta-Learning Approach to Network Training' 的科研主题。它们共同构成独一无二的指纹。

引用此