Abstract
Batch normalization (BN) has become ubiquitous in modern deep learning architectures because of its remarkable improvement in deep neural network (DNN) training performance. However, the two-pass computation of statistical estimation and element-wise normalization in BN training requires two accesses to the input data, resulting in a huge increase in off-chip memory traffic during DNN training. In this brief, we propose a novel accelerator, named one-pass normalizer (OPN) to achieve memory-efficient BN for on-device training. Specifically, in terms of dataflow, we propose one-pass computation based on sampling-based range normalization and sparse data recovery techniques to reduce BN off-chip memory access. Regarding the OPN circuit, we propose channel-wise constant extraction to achieve a compact design. Experimental results show that the one-pass computation reduces off-chip memory access of BN by 2.0~3.8× compared with the previous state-of-the-art designs while maintaining training performance. Moreover, the channel-wise constant extraction saves the gate count and power consumption of OPN by 56% and 73%, respectively.
| Original language | English |
|---|---|
| Pages (from-to) | 3186-3190 |
| Number of pages | 5 |
| Journal | IEEE Transactions on Circuits and Systems II: Express Briefs |
| Volume | 71 |
| Issue number | 6 |
| DOIs | |
| State | Published - 1 Jun 2024 |
Keywords
- Memory-efficient accelerator
- batch normalization
- deep neural networks
- on-device training
- one-pass computation
Fingerprint
Dive into the research topics of 'Memory-Efficient Batch Normalization by One-Pass Computation for On-Device Training'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver