Abstract
In order to address the compatibility issue between model compression algorithms and the versatile tensor accelerator (VTA) , an adaptive fine-grained structured sparse design tailored for this accelerator is proposed by enhancing the classical YOLObile block-wise pruning method and evaluates its performance. In light of the multi-dimensional loop unfolding characteristics of VTA, the model's weight tensors are divided into 32X32 blocks. This approach integrates temporal distillation and spatial distillation to align multidimensional features. Through a single-stage iterative training method, the calculation process of the original ADMM algorithm is refined to improve model deployment accuracy while reducing training costs. An adaptive layer pruning rate module is introduced to dynamically allocate the total pruning rate, facilitating end-to-end automated pruning. The experimental results demonstrate that this improved method effectively reduces floating-point computations by approximately 2.4% and enhances the accuracy of compressed models across various tasks such as image classification and object detection, with a maximum growth percentage of 2. 6%. This method offers an efficient and lightweight software solution for the sparse deployment of deep learning models on VTAs.
| Translated title of the contribution | Fine-Grained Structured Sparse Design for Versatile Tensor Accelerator |
|---|---|
| Original language | Chinese (Traditional) |
| Pages (from-to) | 176-184 |
| Number of pages | 9 |
| Journal | Hsi-An Chiao Tung Ta Hsueh/Journal of Xi'an Jiaotong University |
| Volume | 58 |
| Issue number | 11 |
| DOIs | |
| State | Published - Nov 2024 |