TY - GEN
T1 - A reconfigurable parallel FPGA accelerator for the adapt-then-combine diffusion LMS algorithm
AU - Yu, Qihang
AU - Ma, Yongqiang
AU - Chen, Badong
AU - Principe, Jose
AU - Zheng, Nanning
AU - Ren, Pengju
N1 - Publisher Copyright:
© 2016 IEEE.
PY - 2016/7/29
Y1 - 2016/7/29
N2 - The combination of diffusion strategies and least-mean-square (LMS) algorithm provides many advantages for adaptive-filter to solve distributed optimization, estimation and inference problems. However, suffering from high computation complexity, software implementation of diffusion LMS algorithm is unsuitable for real-time and portable applications. In order to extend its availability, we design a reconfigurable parallel FPG accelerator by exploring multiple dimensions of parallelism, including: parallel execution of agents state updating, data combining, data training and multi-stages pipeline to speedup the execution time. The accelerator for networks with various number of agents and different input dimensions is implemented. Results demonstrate that, it can achieve a speedup of three orders of magnitude at 100Mhz compared with C implementation for a 32-nodes network with 16-dimensional input-data.
AB - The combination of diffusion strategies and least-mean-square (LMS) algorithm provides many advantages for adaptive-filter to solve distributed optimization, estimation and inference problems. However, suffering from high computation complexity, software implementation of diffusion LMS algorithm is unsuitable for real-time and portable applications. In order to extend its availability, we design a reconfigurable parallel FPG accelerator by exploring multiple dimensions of parallelism, including: parallel execution of agents state updating, data combining, data training and multi-stages pipeline to speedup the execution time. The accelerator for networks with various number of agents and different input dimensions is implemented. Results demonstrate that, it can achieve a speedup of three orders of magnitude at 100Mhz compared with C implementation for a 32-nodes network with 16-dimensional input-data.
KW - Diffusion least mean square
KW - FPGA hardware acceleration
UR - https://www.scopus.com/pages/publications/84983459122
U2 - 10.1109/ISCAS.2016.7527216
DO - 10.1109/ISCAS.2016.7527216
M3 - 会议稿件
AN - SCOPUS:84983459122
T3 - Proceedings - IEEE International Symposium on Circuits and Systems
SP - 245
EP - 248
BT - ISCAS 2016 - IEEE International Symposium on Circuits and Systems
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2016 IEEE International Symposium on Circuits and Systems, ISCAS 2016
Y2 - 22 May 2016 through 25 May 2016
ER -