Suppr超能文献

STYLETTS-VC:基于风格的语音合成模型知识迁移实现的一次性语音转换

STYLETTS-VC: ONE-SHOT VOICE CONVERSION BY KNOWLEDGE TRANSFER FROM STYLE-BASED TTS MODELS.

作者信息

Li Yinghao Aaron, Han Cong, Mesgarani Nima

机构信息

Department of Electrical Engineering, Columbia University, USA.

出版信息

SLT Workshop Spok Lang Technol. 2023 Jan;2022:920-927. doi: 10.1109/slt54892.2023.10022498.

Abstract

One-shot voice conversion (VC) aims to convert speech from any source speaker to an arbitrary target speaker with only a few seconds of reference speech from the target speaker. This relies heavily on disentangling the speaker's identity and speech content, a task that still remains challenging. Here, we propose a novel approach to learning disentangled speech representation by transfer learning from style-based text-to-speech (TTS) models. With cycle consistent and adversarial training, the style-based TTS models can perform transcription-guided one-shot VC with high fidelity and similarity. By learning an additional mel-spectrogram encoder through a teacher-student knowledge transfer and novel data augmentation scheme, our approach results in disentangled speech representation without needing the input text. The subjective evaluation shows that our approach can significantly outperform the previous state-of-the-art one-shot voice conversion models in both naturalness and similarity.

摘要

一次性语音转换(VC)旨在仅通过目标说话者的几秒钟参考语音,将来自任何源说话者的语音转换为任意目标说话者的语音。这在很大程度上依赖于分离说话者的身份和语音内容,而这一任务仍然具有挑战性。在此,我们提出一种新颖的方法,通过基于风格的文本到语音(TTS)模型进行迁移学习来学习分离的语音表示。通过循环一致和对抗训练,基于风格的TTS模型可以高保真度和相似性地执行转录引导的一次性VC。通过师生知识转移和新颖的数据增强方案学习额外的梅尔频谱编码器,我们的方法无需输入文本即可产生分离的语音表示。主观评估表明,我们的方法在自然度和相似性方面均能显著优于先前的一次性语音转换模型。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/936c/10417535/8bda43a88c01/nihms-1919646-f0001.jpg

相似文献

3
Noise-robust voice conversion with domain adversarial training.基于域对抗训练的抗噪语音转换。
Neural Netw. 2022 Apr;148:74-84. doi: 10.1016/j.neunet.2022.01.003. Epub 2022 Jan 13.
5
Cycle consistent network for end-to-end style transfer TTS training.循环一致网络用于端到端风格转换 TTS 训练。
Neural Netw. 2021 Aug;140:223-236. doi: 10.1016/j.neunet.2021.03.005. Epub 2021 Mar 16.
8
Attention-based speech feature transfer between speakers.基于注意力机制的说话人之间的语音特征转移。
Front Artif Intell. 2024 Feb 26;7:1259641. doi: 10.3389/frai.2024.1259641. eCollection 2024.
10

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验