Simultaneous Flexible Keyword Detection and Text-dependent Speaker Recognition for Low-resource Devices

  • Hiroshi Fujimura
  • , Ning Ding
  • , Daichi Hayakawa
  • , Takehiko Kagoshima

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

This paper proposes a new method for simultaneous flexible keyword detection and text-dependent speaker identification using a recognized keyword. The purpose is to identify a speaker from among a set of preregistered speakers on the basis of a short-command utterance in an office or home on low-resource chip devices. The first contribution is to construct the process that includes a neural network (NN) and a customized Viterbi-based algorithm for flexible keyword detection, and Gaussian mixture models (GMMs) for speaker identification. Outputs of a middle layer in the NN and alignment information for keyword detection are also used for creating feature vectors for speaker GMMs. The second contribution is to apply DropConnect in speaker-modeling uncertainties of the Bayesian NN that is used for speaker reacognition. It results in robust speaker models when enrollment utterances are few. Evaluation was conducted using 39 Japanese keywords by 100 speakers. Recognition performance was measured on the basis of false acceptances and false rejects using keyword utterances. Speaker identification for 100 pre-registered speakers for recognized keywords was simultaneously evaluated. The identification rate when using a conventional i-vector method was 71.22%. By contrast, the identification rate of the proposed method was 89.29% while using low-cost resources.

Original languageEnglish
Title of host publicationICPRAM 2020 - Proceedings of the 9th International Conference on Pattern Recognition Applications and Methods, Volume 1
EditorsMaria De Marsico, Gabriella Sanniti di Baja, Ana L.N. Fred
PublisherScience and Technology Publications, Lda
Pages297-307
Number of pages11
ISBN (Print)9789897583971
DOIs
StatePublished - 2020
Event9th International Conference on Pattern Recognition Applications and Methods , ICPRAM 2020 - Valletta, Malta
Duration: 22 Feb 202024 Feb 2020

Publication series

NameInternational Conference on Pattern Recognition Applications and Methods
Volume1
ISSN (Electronic)2184-4313

Conference

Conference9th International Conference on Pattern Recognition Applications and Methods , ICPRAM 2020
Country/TerritoryMalta
CityValletta
Period22/02/2024/02/20

Keywords

  • Bayesian
  • Detection
  • Low Resource Device
  • Speaker Identification

Fingerprint

Dive into the research topics of 'Simultaneous Flexible Keyword Detection and Text-dependent Speaker Recognition for Low-resource Devices'. Together they form a unique fingerprint.

Cite this