RTA: A Reconfigurable Transformer Accelerator Exploiting Sparsity via Low-Bit-Width Prediction

  • Yujie Chen
  • , Chen Yang
  • , Yuheng Xia
  • , Yishuo Meng
  • , Jianfei Wang
  • , Qiang Fu
  • , Li Geng

Research output: Contribution to journalArticlepeer-review

Abstract

Transformer models have received widespread attention in recent years. They have gradually replaced recurrent neural networks (RNNs) in natural language processing (NLP) and are widely used in tasks such as machine translation, text generation, and language understanding. Similarly, transformers have shown impressive results in computer vision (CV). However, their unique attention mechanism places high demands on the computational and storage resources of the hardware. Deploying transformers on edge computing platforms is challenging due to their complex data flow, intensive matrix calculations, and the need for high-precision nonlinear functions. To address these challenges, we propose reconfigurable transformer accelerator (RTA), a transformer hardware accelerator that uses low-bit-width prediction to achieve dynamic sparsity. RTA reduces resource consumption by performing sparse matrix multiplications using low-bit-width operations, while its reconfigurable design allows the sparse module to be used for high-precision large-bit-width matrix multiplications. We have also optimized the RTA computing pipeline to reduce resource usage and improve computational efficiency. Additionally, we incorporate feature sharing to enhance the resource utilization efficiency of the hardware accelerator. Experimental results on the transformer-base model show that RTA achieves an average performance of 994 GOPS and a digital signal processor (DSP) efficiency of 1412. Compared to state-of-the-art transformer accelerators, RTA achieves (Formula presented) efficiency.

Original languageEnglish
Pages (from-to)2702-2714
Number of pages13
JournalIEEE Transactions on Very Large Scale Integration (VLSI) Systems
Volume33
Issue number10
DOIs
StatePublished - 2025

Keywords

  • Hardware acceleration
  • hardware reconfigurability
  • low-bit-width prediction
  • transformer

Fingerprint

Dive into the research topics of 'RTA: A Reconfigurable Transformer Accelerator Exploiting Sparsity via Low-Bit-Width Prediction'. Together they form a unique fingerprint.

Cite this