跳到主要导航 跳到搜索 跳到主要内容

A novel method for Mandarin speech synthesis by inserting prosodic structure prediction into Tacotron2

  • Xi'an Jiaotong University
  • Beijing Sankuai Online Technology Co.,Ltd.

科研成果: 期刊稿件文章同行评审

15 引用 (Scopus)

摘要

Speech synthesis, an artificial intelligence technology that employs computers to imitate human speech, has played a crucial role in human–computer interaction since it can automatically convert text into speech with satisfactory intelligibility and naturalness. Tacotron2 is the second generation end-to-end English speech synthesis model developed by Google. As Mandarin becomes more and more popular in the world, the associated speech synthesis technologies have been applied in various applications. Aiming at extending Tacotron2 to synthesize Mandarin speech, we propose in this paper a novel synthesis method by adding a Mandarin-to-PinYin module and a prosodic structure prediction model into Tacotron2. By evaluating synthesized results with subjective and objective methods, the added prosodic structure prediction model is demonstrated to help Tacotron2 synthesize more natural and human-like Mandarin speech.

源语言英语
页(从-至)2809-2823
页数15
期刊International Journal of Machine Learning and Cybernetics
12
10
DOI
出版状态已出版 - 10月 2021

学术指纹

探究 'A novel method for Mandarin speech synthesis by inserting prosodic structure prediction into Tacotron2' 的科研主题。它们共同构成独一无二的指纹。

引用此