TY - JOUR
T1 - GMM and CNN Hybrid Method for Short Utterance Speaker Recognition
AU - Liu, Zheli
AU - Wu, Zhendong
AU - Li, Tong
AU - Li, Jin
AU - Shen, Chao
N1 - Publisher Copyright:
© 2005-2012 IEEE.
PY - 2018/7
Y1 - 2018/7
N2 - During the last few years, the speaker recognition technique has been widely attractive for its extensive application in many fields, such as speech communications, domestics services, and smart terminals. As a critical method, the Gaussian mixture model (GMM) makes it possible to achieve the recognition capability that is close to the hearing ability of human in a long speech. However, the GMM is failing to recognize a short utterance speaker with a high accuracy. Aiming at solving this problem, in this paper, we propose a novel model to enhance the recognition accuracy of the short utterance speaker recognition system. Different from traditional models based on the GMM, we design a method to train a convolutional neural network to process spectrograms, which can describe speakers better. Thus, the recognition system gains the considerable accuracy as well as the reasonable convergence speed. The experiment results show that our model can help to decrease the equal error rate of the recognition from 4.9% to 2.5%.
AB - During the last few years, the speaker recognition technique has been widely attractive for its extensive application in many fields, such as speech communications, domestics services, and smart terminals. As a critical method, the Gaussian mixture model (GMM) makes it possible to achieve the recognition capability that is close to the hearing ability of human in a long speech. However, the GMM is failing to recognize a short utterance speaker with a high accuracy. Aiming at solving this problem, in this paper, we propose a novel model to enhance the recognition accuracy of the short utterance speaker recognition system. Different from traditional models based on the GMM, we design a method to train a convolutional neural network to process spectrograms, which can describe speakers better. Thus, the recognition system gains the considerable accuracy as well as the reasonable convergence speed. The experiment results show that our model can help to decrease the equal error rate of the recognition from 4.9% to 2.5%.
KW - Convolutional neural network (CNN)
KW - speaker verification
KW - spectrogram
KW - universal background model maximum a posteriori Gaussian mixture model (UBM-MAP-GMM)
UR - https://www.scopus.com/pages/publications/85043385905
U2 - 10.1109/TII.2018.2799928
DO - 10.1109/TII.2018.2799928
M3 - 文章
AN - SCOPUS:85043385905
SN - 1551-3203
VL - 14
SP - 3244
EP - 3252
JO - IEEE Transactions on Industrial Informatics
JF - IEEE Transactions on Industrial Informatics
IS - 7
ER -