TY - GEN
T1 - Exploiting Variable Precision Computation Array for Scalable Neural Network Accelerators
AU - Yang, Shaofei
AU - Liu, Longjun
AU - Li, Baoting
AU - Sun, Hongbin
AU - Zheng, Nanning
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2020/8
Y1 - 2020/8
N2 - In this paper, we present a flexible Variable Precision Computation Array (VPCA) component for different accelerators, which leverages a sparsification scheme for activations and a low bits serial-parallel combination computation unit for improving the efficiency and resiliency of accelerators. The VPCA can dynamically decompose the width of activation/weights (from 32bit to 3bit in different accelerators) into 2-bits serial computation units while the 2bits computing units can be combined in parallel computing for high throughput. We propose an on-the-fly compressing and calculating strategy SLE-CLC (single lane encoding, cross lane calculation), which could further improve performance of 2-bit parallel computing. The experiments results on image classification datasets show VPCA can outperforms DaDianNao, Stripes, Loom-2bit by 4.67×, 2.42×, 1.52× without other overhead on convolution layers.
AB - In this paper, we present a flexible Variable Precision Computation Array (VPCA) component for different accelerators, which leverages a sparsification scheme for activations and a low bits serial-parallel combination computation unit for improving the efficiency and resiliency of accelerators. The VPCA can dynamically decompose the width of activation/weights (from 32bit to 3bit in different accelerators) into 2-bits serial computation units while the 2bits computing units can be combined in parallel computing for high throughput. We propose an on-the-fly compressing and calculating strategy SLE-CLC (single lane encoding, cross lane calculation), which could further improve performance of 2-bit parallel computing. The experiments results on image classification datasets show VPCA can outperforms DaDianNao, Stripes, Loom-2bit by 4.67×, 2.42×, 1.52× without other overhead on convolution layers.
KW - Accelerator
KW - Deep Neural Networks
KW - Dynamic Quantization
KW - Energy Efficiency Computing Array
KW - Resiliency
UR - https://www.scopus.com/pages/publications/85085042534
U2 - 10.1109/AICAS48895.2020.9073832
DO - 10.1109/AICAS48895.2020.9073832
M3 - 会议稿件
AN - SCOPUS:85085042534
T3 - Proceedings - 2020 IEEE International Conference on Artificial Intelligence Circuits and Systems, AICAS 2020
SP - 315
EP - 319
BT - Proceedings - 2020 IEEE International Conference on Artificial Intelligence Circuits and Systems, AICAS 2020
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2020 IEEE International Conference on Artificial Intelligence Circuits and Systems, AICAS 2020
Y2 - 31 August 2020 through 2 September 2020
ER -