Generalization Performance of Empirical Risk Minimization on Over-Parameterized Deep ReLU Nets

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

In this paper, we study the generalization performance of global minima of empirical risk minimization (ERM) on over-parameterized deep ReLU nets. Using a novel deepening scheme for deep ReLU nets, we rigorously prove that there exist perfect global minima achieving optimal generalization error rates for numerous types of data under mild conditions. Since over-parameterization of deep ReLU nets is crucial to guarantee that the global minima of ERM can be realized by the widely used stochastic gradient descent (SGD) algorithm, our results present a potential way to fill the gap between optimization and generalization of deep learning.

Original languageEnglish
Pages (from-to)1978-1993
Number of pages16
JournalIEEE Transactions on Information Theory
Volume71
Issue number3
DOIs
StatePublished - 2025

Keywords

  • Deep learning
  • empirical risk minimization
  • global minima
  • over-parameterization

Fingerprint

Dive into the research topics of 'Generalization Performance of Empirical Risk Minimization on Over-Parameterized Deep ReLU Nets'. Together they form a unique fingerprint.

Cite this