摘要
Speech synthesis, an artificial intelligence technology that employs computers to imitate human speech, has played a crucial role in human–computer interaction since it can automatically convert text into speech with satisfactory intelligibility and naturalness. Tacotron2 is the second generation end-to-end English speech synthesis model developed by Google. As Mandarin becomes more and more popular in the world, the associated speech synthesis technologies have been applied in various applications. Aiming at extending Tacotron2 to synthesize Mandarin speech, we propose in this paper a novel synthesis method by adding a Mandarin-to-PinYin module and a prosodic structure prediction model into Tacotron2. By evaluating synthesized results with subjective and objective methods, the added prosodic structure prediction model is demonstrated to help Tacotron2 synthesize more natural and human-like Mandarin speech.
| 源语言 | 英语 |
|---|---|
| 页(从-至) | 2809-2823 |
| 页数 | 15 |
| 期刊 | International Journal of Machine Learning and Cybernetics |
| 卷 | 12 |
| 期 | 10 |
| DOI | |
| 出版状态 | 已出版 - 10月 2021 |
学术指纹
探究 'A novel method for Mandarin speech synthesis by inserting prosodic structure prediction into Tacotron2' 的科研主题。它们共同构成独一无二的指纹。引用此
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver