Abstract
In this paper, we study the generalization performance of global minima of empirical risk minimization (ERM) on over-parameterized deep ReLU nets. Using a novel deepening scheme for deep ReLU nets, we rigorously prove that there exist perfect global minima achieving optimal generalization error rates for numerous types of data under mild conditions. Since over-parameterization of deep ReLU nets is crucial to guarantee that the global minima of ERM can be realized by the widely used stochastic gradient descent (SGD) algorithm, our results present a potential way to fill the gap between optimization and generalization of deep learning.
| Original language | English |
|---|---|
| Pages (from-to) | 1978-1993 |
| Number of pages | 16 |
| Journal | IEEE Transactions on Information Theory |
| Volume | 71 |
| Issue number | 3 |
| DOIs | |
| State | Published - 2025 |
Keywords
- Deep learning
- empirical risk minimization
- global minima
- over-parameterization
Fingerprint
Dive into the research topics of 'Generalization Performance of Empirical Risk Minimization on Over-Parameterized Deep ReLU Nets'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver