TY - JOUR
T1 - Robust sound source localization using convolutional neural network based on microphone array
AU - Zhao, Xiaoyan
AU - Zhou, Lin
AU - Tong, Ying
AU - Qi, Yuxiao
AU - Shi, Jingang
N1 - Publisher Copyright:
© 2021, Tech Science Press. All rights reserved.
PY - 2021
Y1 - 2021
N2 - In order to improve the performance of microphone array-based sound source localization (SSL), a robust SSL algorithm using convolutional neural network (CNN) is proposed in this paper. The Gammatone sub-band steered response power-phase transform (SRP-PHAT) spatial spectrum is adopted as the localization cue due to its feature correlation of consecutive sub-bands. Since CNN has the “weight sharing” characteristics and the advantage of processing tensor data, it is adopted to extract spatial location information from the localization cues. The Gammatone sub-band SRP-PHAT spatial spectrum are calculated through the microphone signals decomposed in frequency domain by Gammatone filters bank. The proposed algorithm takes a two-dimensional feature matrix which is assembled from Gammatone sub-band SRP-PHAT spatial spectrum within a frame as CNN input. Taking the advantage of powerful modeling capability of CNN, the two-dimensional feature matrices in diverse environments are used together to train the CNN model which reflects mapping regularity between the feature matrix and the azimuth of sound source. The estimated azimuth of the testing signal is predicted through the trained CNN model. Experimental results show the superiority of the proposed algorithm in SSL problem, it achieves significantly improved localization performance and capacity of robustness and generality in various acoustic environments.
AB - In order to improve the performance of microphone array-based sound source localization (SSL), a robust SSL algorithm using convolutional neural network (CNN) is proposed in this paper. The Gammatone sub-band steered response power-phase transform (SRP-PHAT) spatial spectrum is adopted as the localization cue due to its feature correlation of consecutive sub-bands. Since CNN has the “weight sharing” characteristics and the advantage of processing tensor data, it is adopted to extract spatial location information from the localization cues. The Gammatone sub-band SRP-PHAT spatial spectrum are calculated through the microphone signals decomposed in frequency domain by Gammatone filters bank. The proposed algorithm takes a two-dimensional feature matrix which is assembled from Gammatone sub-band SRP-PHAT spatial spectrum within a frame as CNN input. Taking the advantage of powerful modeling capability of CNN, the two-dimensional feature matrices in diverse environments are used together to train the CNN model which reflects mapping regularity between the feature matrix and the azimuth of sound source. The estimated azimuth of the testing signal is predicted through the trained CNN model. Experimental results show the superiority of the proposed algorithm in SSL problem, it achieves significantly improved localization performance and capacity of robustness and generality in various acoustic environments.
KW - Convolutional neural network
KW - Gammatone sub-band steered response power-phase transform spatial spectrum
KW - Microphone array
KW - Sound source localization
UR - https://www.scopus.com/pages/publications/85111630668
U2 - 10.32604/iasc.2021.018823
DO - 10.32604/iasc.2021.018823
M3 - 文章
AN - SCOPUS:85111630668
SN - 1079-8587
VL - 30
SP - 361
EP - 371
JO - Intelligent Automation and Soft Computing
JF - Intelligent Automation and Soft Computing
IS - 1
ER -