TY - GEN
T1 - An Energy-Efficient and Flexible Accelerator based on Reconfigurable Computing for Multiple Deep Convolutional Neural Networks
AU - Yang, Chen
AU - Zhang, Haibo
AU - Wang, Xiaoli
AU - Geng, Li
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2018/12/5
Y1 - 2018/12/5
N2 - Multiple Convolutional Neural Networks (CNNs) become widely used in modern AI systems. There is increasingly necessity to apply different CNN shapes for different scenarios. However, it also brings challenges on throughput, energy efficiency and flexibility to hardware. In this paper, a novel accelerator, called reconfigurable neural accelerator (RNA), was proposed based on reconfigurable computing technology. In addition, image row broadcast (IRB) and zero detection technology (ZDT) were applied for increased energy efficiency and throughput. IRB can optimize the convolutional dataflow on spatial array architecture with 22×22 processing elements, increasing data reuse and reducing data movement. ZDT reduces the weight data access of the fully connected layer. At the cost of 10.25W power consumption on Virtex UltraScale XCVU440 platform, RNA can process the convolutional layers at 97.4 GOPS for AlexNet, at 90.75GOPS for VGG and at 100.8 GOPS for Lenet-5, respectively.
AB - Multiple Convolutional Neural Networks (CNNs) become widely used in modern AI systems. There is increasingly necessity to apply different CNN shapes for different scenarios. However, it also brings challenges on throughput, energy efficiency and flexibility to hardware. In this paper, a novel accelerator, called reconfigurable neural accelerator (RNA), was proposed based on reconfigurable computing technology. In addition, image row broadcast (IRB) and zero detection technology (ZDT) were applied for increased energy efficiency and throughput. IRB can optimize the convolutional dataflow on spatial array architecture with 22×22 processing elements, increasing data reuse and reducing data movement. ZDT reduces the weight data access of the fully connected layer. At the cost of 10.25W power consumption on Virtex UltraScale XCVU440 platform, RNA can process the convolutional layers at 97.4 GOPS for AlexNet, at 90.75GOPS for VGG and at 100.8 GOPS for Lenet-5, respectively.
KW - CNN
KW - Image Row Broadcast dataflow
KW - Reconfigurable computing
KW - Zero Detection Technology
UR - https://www.scopus.com/pages/publications/85060305393
U2 - 10.1109/ICSICT.2018.8565823
DO - 10.1109/ICSICT.2018.8565823
M3 - 会议稿件
AN - SCOPUS:85060305393
T3 - 2018 14th IEEE International Conference on Solid-State and Integrated Circuit Technology, ICSICT 2018 - Proceedings
BT - 2018 14th IEEE International Conference on Solid-State and Integrated Circuit Technology, ICSICT 2018 - Proceedings
A2 - Tang, Ting-Ao
A2 - Ye, Fan
A2 - Jiang, Yu-Long
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 14th IEEE International Conference on Solid-State and Integrated Circuit Technology, ICSICT 2018
Y2 - 31 October 2018 through 3 November 2018
ER -