Skip to main navigation Skip to search Skip to main content

AMA: An Analytical Approach to Maximizing the Efficiency of Deep Learning on Versal AI Engine

  • Xi'an Jiaotong University

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

The traditional cache-based multi-core architecture represented by CUDA has been plagued by the 'memory wall' problem, and a large number of applications represented by large language model inference are unable to meet the high computational intensity requirements. The Versal AI Engine architecture provides a variety of rich inter-core connections in the processor array, increasing many data reuse opportunities and potentially alleviating the 'memory wall' issue. However, traditional parallel programming models cannot be directly applied to this architecture, and how to map computations to achieve high computational utilization becomes a new challenge. To address this, we propose AMA, a hierarchical performance analysis model built on the Versal AI Engine architecture, designed to maximize the efficiency of typical deep learning applications. Experiments show that AMA modeling is accurate and efficient. On the VCK190 platform, we achieved a matrix multiplication throughput of 5867.29 GFLOPS in fp32 and 88.55 TOPS in int8, and a convolution throughput of 99.6770 TOPS. In terms of energy efficiency, AMA achieved 142.68 GFLOPS/W in fp32 precision and 1.416 TOPS/W in int8 matrix multiplication. Compared to the current state-of-the-art methods, we achieved a 14.99% increase in throughput and a 22.92% increase in energy efficiency, providing new analytical performance model and practical guidance for efficient deep learning deployment on AI Engine.

Original languageEnglish
Title of host publicationProceedings - 2024 34th International Conference on Field-Programmable Logic and Applications, FPL 2024
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages227-235
Number of pages9
ISBN (Electronic)9798331530075
DOIs
StatePublished - 2024
Event34th International Conference on Field-Programmable Logic and Applications, FPL 2024 - Torino, Italy
Duration: 2 Sep 20246 Sep 2024

Publication series

NameProceedings - 2024 34th International Conference on Field-Programmable Logic and Applications, FPL 2024

Conference

Conference34th International Conference on Field-Programmable Logic and Applications, FPL 2024
Country/TerritoryItaly
CityTorino
Period2/09/246/09/24

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

  1. SDG 7 - Affordable and Clean Energy
    SDG 7 Affordable and Clean Energy

Keywords

  • AI Engine
  • Convolution
  • Deep Learning
  • Matrix Multiplication
  • Versal

Fingerprint

Dive into the research topics of 'AMA: An Analytical Approach to Maximizing the Efficiency of Deep Learning on Versal AI Engine'. Together they form a unique fingerprint.

Cite this