TY - JOUR
T1 - REMAP
T2 - A Spatiotemporal CNN Accelerator Optimization Methodology and Toolkit Thereof
AU - Zhao, Boran
AU - Xia, Tian
AU - Zhai, Haiming
AU - Ma, Fulun
AU - Du, Yan
AU - Chang, Hanzhi
AU - Zhao, Wenzhe
AU - Ren, Pengju
N1 - Publisher Copyright:
© 1982-2012 IEEE.
PY - 2023/5/1
Y1 - 2023/5/1
N2 - Designing convolutional neural network (CNN) accelerators is getting more difficult owing to the fast-increasing types of CNN models. Some approaches use constant dataflow and microarchitecture that have lower design complexity. However, these accelerators are difficult to adapt with the highly-diverse CNN models and often suffer from low process element utilization. Some other accelerators resort to reconfigurable devices, such as field-programmable gate array (FPGA) and coarse-grained reconfigurable array to support flexible dataflows in order to fit diverse CNN layers. However, layer-by-layer processing may require more energy for frequent reconfiguration and off-chip DDR access. In this work, we introduce a reconfigurable pipeline accelerator (RPA) that can reduce the latency and DDR access by pipelining the compuptation of CNN layers. Although there have been several researches that try to speedup the design process by automatically exploring subset of the accelerator design space, identifying an available automated design tool that can effectively find the complete and optimal design scheme remains a problem, especially for the novel RPA architecture type. Unfortunately, comprehensive exploration of the whole design space faces an excessive large searching space. To tackle this problem, we propose REMAP, a toolkit for designing CNN accelerators based on the Monte Carlo tree search (MCTS) method. To efficiently search the huge design space, we propose several methods to improve searching efficiency. Evaluations show that REMAP significantly outperforms some state-of-the-art approaches; compared with GAMMA, it achieves an average speed increase of 14.75×, and an energy reduction of 45.45%; it also achieves a speed increase of 32.6× against ConfuciuX on MobileNetV2 and ResNet50. We also show an FPGA accelerator implementation which is based on REMAP's search result, and it achieves high performance in real-time CNN tasks. This indicates that REMAP can provide high-quality design exploration with valuable insights and useful architecture design guidances.
AB - Designing convolutional neural network (CNN) accelerators is getting more difficult owing to the fast-increasing types of CNN models. Some approaches use constant dataflow and microarchitecture that have lower design complexity. However, these accelerators are difficult to adapt with the highly-diverse CNN models and often suffer from low process element utilization. Some other accelerators resort to reconfigurable devices, such as field-programmable gate array (FPGA) and coarse-grained reconfigurable array to support flexible dataflows in order to fit diverse CNN layers. However, layer-by-layer processing may require more energy for frequent reconfiguration and off-chip DDR access. In this work, we introduce a reconfigurable pipeline accelerator (RPA) that can reduce the latency and DDR access by pipelining the compuptation of CNN layers. Although there have been several researches that try to speedup the design process by automatically exploring subset of the accelerator design space, identifying an available automated design tool that can effectively find the complete and optimal design scheme remains a problem, especially for the novel RPA architecture type. Unfortunately, comprehensive exploration of the whole design space faces an excessive large searching space. To tackle this problem, we propose REMAP, a toolkit for designing CNN accelerators based on the Monte Carlo tree search (MCTS) method. To efficiently search the huge design space, we propose several methods to improve searching efficiency. Evaluations show that REMAP significantly outperforms some state-of-the-art approaches; compared with GAMMA, it achieves an average speed increase of 14.75×, and an energy reduction of 45.45%; it also achieves a speed increase of 32.6× against ConfuciuX on MobileNetV2 and ResNet50. We also show an FPGA accelerator implementation which is based on REMAP's search result, and it achieves high performance in real-time CNN tasks. This indicates that REMAP can provide high-quality design exploration with valuable insights and useful architecture design guidances.
KW - Convolutional neural network (CNN) accelerators
KW - Monte Carlo tree search (MCTS)
KW - design toolkit
KW - pipeline
UR - https://www.scopus.com/pages/publications/85139445605
U2 - 10.1109/TCAD.2022.3207320
DO - 10.1109/TCAD.2022.3207320
M3 - 文章
AN - SCOPUS:85139445605
SN - 0278-0070
VL - 42
SP - 1691
EP - 1704
JO - IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
JF - IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
IS - 5
ER -