跳到主要导航 跳到搜索 跳到主要内容

TDE-VC: Timbre Disentanglement and Extraction Via Consistency for Zero-Shot Voice Conversion

  • Xinjiang University
  • Public Security Department of Xinjiang

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

Voice conversion (VC) transforms certain characteristics of speech from a source to a target while preserving the original linguistic content. This paper focuses on timbre conversion, a key type of VC. Current VC methods face two challenges: retaining source speaker information in the extracted content and inadequately capturing timbre features, often leading to suboptimal speaker similarity in the converted speech. To address these issues, we propose the TDE-VC model, a zero-shot voice conversion framework that incorporates a phased-trained content extractor, combining the strengths of adversarial speaker classifier and data perturbation to extract cleaner content. Critically, we introduce a timbre disentanglement and extraction strategy, based on a multi-level consistency constraint, which effectively disentangles timbre from content and guides the timbre encoder to focus solely on timbre extraction. Additionally, we present an effective multi-scale timbre encoder. Experimental results demonstrate that TDE-VC significantly improves speaker similarity, especially for unseen target speakers, while maintaining competitive naturalness compared to existing methods. The demo page is publicly available.1

源语言英语
主期刊名2025 IEEE International Conference on Multimedia and Expo
主期刊副标题Journey to the Center of Machine Imagination, ICME 2025 - Conference Proceedings
出版商IEEE Computer Society
ISBN(电子版)9798331594954
DOI
出版状态已出版 - 2025
活动2025 IEEE International Conference on Multimedia and Expo, ICME 2025 - Nantes, 法国
期限: 30 6月 20254 7月 2025

出版系列

姓名Proceedings - IEEE International Conference on Multimedia and Expo
ISSN(印刷版)1945-7871
ISSN(电子版)1945-788X

会议

会议2025 IEEE International Conference on Multimedia and Expo, ICME 2025
国家/地区法国
Nantes
时期30/06/254/07/25

学术指纹

探究 'TDE-VC: Timbre Disentanglement and Extraction Via Consistency for Zero-Shot Voice Conversion' 的科研主题。它们共同构成独一无二的指纹。

引用此