Skip to main navigation Skip to search Skip to main content

A novel method for Mandarin speech synthesis by inserting prosodic structure prediction into Tacotron2

  • Xi'an Jiaotong University
  • Beijing Sankuai Online Technology Co.,Ltd.

Research output: Contribution to journalArticlepeer-review

15 Scopus citations

Abstract

Speech synthesis, an artificial intelligence technology that employs computers to imitate human speech, has played a crucial role in human–computer interaction since it can automatically convert text into speech with satisfactory intelligibility and naturalness. Tacotron2 is the second generation end-to-end English speech synthesis model developed by Google. As Mandarin becomes more and more popular in the world, the associated speech synthesis technologies have been applied in various applications. Aiming at extending Tacotron2 to synthesize Mandarin speech, we propose in this paper a novel synthesis method by adding a Mandarin-to-PinYin module and a prosodic structure prediction model into Tacotron2. By evaluating synthesized results with subjective and objective methods, the added prosodic structure prediction model is demonstrated to help Tacotron2 synthesize more natural and human-like Mandarin speech.

Original languageEnglish
Pages (from-to)2809-2823
Number of pages15
JournalInternational Journal of Machine Learning and Cybernetics
Volume12
Issue number10
DOIs
StatePublished - Oct 2021

Keywords

  • Intelligibility
  • Naturalness
  • Prosodic structure prediction
  • Speech synthesis
  • Tacotron2

Fingerprint

Dive into the research topics of 'A novel method for Mandarin speech synthesis by inserting prosodic structure prediction into Tacotron2'. Together they form a unique fingerprint.

Cite this