Abstract
Leveraging sparsity in convolutional neural networks (CNNs) has emerged as a promising technique for enhancing the performance of CNN accelerators. However, despite achieving significant improvements in multiplier efficiency, the current sparse-based methods always fail to achieve a competitive performance and runtime latency compared with the conventional dense-based methods. This study posits that, because of the extremely irregular workload distribution in the input feature maps (IFMs), attaining a simultaneous improvement of the performance and multiplier efficiency in sparse-based accelerators is challenging. To address the above challenge, this study proposes a novel framework for eliminating irregularities at multi-level (i.e., dataflow-algorithm-hardware). Specifically, combined with filter decomposition and Winograd algorithms, a computation-oriented dataflow is designed for theoretical workload balancing under different convolution tasks. Furthermore, by exploiting the near-structured characteristic of IFMs, an online tile-based regularization scheme and a hybrid computation reduction method are designed to achieve a decreased and regular workload distribution. Finally, a large-scale sparse CNN accelerator, which integrates a row-merging scheme as well as a workload remapping method, is implemented to further eliminate hardware-level irregularities. The evaluation results show that our proposed methods can achieve 45.93%~76.59% multiplication savings when applied to VGG16, ResNet-34, and ResNet-50. Moreover, our accelerator can accomplish 3.06 TOPS and 2.86 sparsity extraction efficiency (SEE) while deploying VGG16, achieving a 1.17× to 3.45× enhancement on SEE compared with the state-of-the-art sparse-based accelerators.
| Original language | English |
|---|---|
| Pages (from-to) | 6578-6591 |
| Number of pages | 14 |
| Journal | IEEE Transactions on Circuits and Systems I: Regular Papers |
| Volume | 72 |
| Issue number | 11 |
| DOIs | |
| State | Published - 2025 |
Keywords
- Hardware accelerator
- IFM regularization
- convolutional neural network
- row merging
- sparsity extraction efficiency
Fingerprint
Dive into the research topics of 'An Efficient CNN Accelerator Exploiting Novel Tile-Based Near-Structured Sparsity to Achieve Multi-Level Irregularity Elimination'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver