• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于嘶哑到正常语音转换的多域生成对抗网络。

A Multidomain Generative Adversarial Network for Hoarse-to-Normal Voice Conversion.

作者信息

Chu Minghang, Wang Jing, Fan Zhiwei, Yang Mengtao, Xu Chao, Ma Yaoyao, Tao Zhi, Wu Di

机构信息

School of Optoelectronic Science and Engineering, Soochow University, Suzhou, Jiangsu, China.

School of Optoelectronic Science and Engineering, Soochow University, Suzhou, Jiangsu, China.

出版信息

J Voice. 2023 Oct 14. doi: 10.1016/j.jvoice.2023.08.027.

DOI:10.1016/j.jvoice.2023.08.027
PMID:37845148
Abstract

Hoarse voice affects the efficiency of communication between people. However, surgical treatment may result in patients with poorer voice quality, and voice repair techniques can only repair vowels. In this paper, we propose a novel multidomain generative adversarial voice conversion method to achieve hoarse-to-normal voice conversion and personalize voices for patients with hoarseness. The proposed method aims to improve the speech quality of hoarse voices through a multidomain generative adversarial network. The proposed method is evaluated on subjective and objective evaluation metrics. According to the findings of the spectrum analysis, the suggested method converts hoarse voice formants more effectively than variational auto-encoder (VAE), Auto-VC (voice conversion), StarGAN-VC (Generative Adversarial Network- Voice Conversion), and CycleVAE. For the word error rate, the suggested method obtains absolute gains of 35.62, 37.97, 45.42, and 50.05 compared to CycleVAE, StarGAN-VC, Auto-VC, and VAE, respectively. The suggested method achieves CycleVAE, VAE, StarGAN-VC, and Auto-VC, respectively, in terms of naturalness by 42.49%, 51.60%, 69.37%, and 77.54%. The suggested method outperforms VAE, CycleVAE, StarGAN-VC, and Auto-VC, respectively, in terms of intelligibility, with absolute gains of 0.87, 0.93, 1.08, and 1.13. In terms of content similarity, the proposed method obtains 43.48%, 75.52%, 76.21%, and 108.62% improvements compared to CycleVAE, StarGAN-VC, Auto-VC, and VAE, respectively. ABX results show that the suggested method can personalize the voice for patients with hoarseness. This study demonstrates the feasibility of voice conversion methods in improving the speech quality of hoarse voices.

摘要

嗓音嘶哑会影响人与人之间的沟通效率。然而,手术治疗可能会导致患者的嗓音质量更差,并且语音修复技术只能修复元音。在本文中,我们提出了一种新颖的多域生成对抗语音转换方法,以实现嘶哑嗓音到正常嗓音的转换,并为嗓音嘶哑的患者实现个性化嗓音。所提出的方法旨在通过多域生成对抗网络提高嘶哑嗓音的语音质量。所提出的方法在主观和客观评估指标上进行了评估。根据频谱分析的结果,所建议的方法比变分自编码器(VAE)、自动语音转换(Auto-VC)、星型生成对抗网络语音转换(StarGAN-VC)和循环变分自编码器(CycleVAE)更有效地转换嘶哑嗓音共振峰。对于单词错误率,所建议的方法与CycleVAE、StarGAN-VC、Auto-VC和VAE相比,分别获得了35.62、37.97、45.42和50.05的绝对增益。所建议的方法在自然度方面分别比CycleVAE、VAE、StarGAN-VC和Auto-VC提高了42.49%、51.60%、69.37%和77.54%。所建议的方法在可懂度方面分别比VAE、CycleVAE、StarGAN-VC和Auto-VC表现更优,绝对增益分别为0.87、0.93、1.08和1.13。在内容相似度方面,所提出的方法与CycleVAE、StarGAN-VC、Auto-VC和VAE相比,分别提高了43.48%、75.52%、76.21%和108.62%。ABX结果表明,所建议的方法可以为嗓音嘶哑的患者实现个性化嗓音。本研究证明了语音转换方法在提高嘶哑嗓音语音质量方面的可行性。

相似文献

1
A Multidomain Generative Adversarial Network for Hoarse-to-Normal Voice Conversion.用于嘶哑到正常语音转换的多域生成对抗网络。
J Voice. 2023 Oct 14. doi: 10.1016/j.jvoice.2023.08.027.
2
E-DGAN: An Encoder-Decoder Generative Adversarial Network Based Method for Pathological to Normal Voice Conversion.E-DGAN:一种基于编解码器生成对抗网络的病理语音到正常语音转换方法。
IEEE J Biomed Health Inform. 2023 May;27(5):2489-2500. doi: 10.1109/JBHI.2023.3239551. Epub 2023 May 4.
3
GLGAN-VC: A Guided Loss-Based Generative Adversarial Network for Many-to-Many Voice Conversion.GLGAN-VC:一种基于引导损失的多对多语音转换生成对抗网络。
IEEE Trans Neural Netw Learn Syst. 2025 Jan;36(1):1813-1826. doi: 10.1109/TNNLS.2023.3335119. Epub 2025 Jan 7.
4
Joint Dictionary Learning-Based Non-Negative Matrix Factorization for Voice Conversion to Improve Speech Intelligibility After Oral Surgery.基于联合字典学习的非负矩阵分解用于口腔手术后语音转换以提高语音清晰度
IEEE Trans Biomed Eng. 2017 Nov;64(11):2584-2594. doi: 10.1109/TBME.2016.2644258.
5
Improving the Efficiency of Dysarthria Voice Conversion System Based on Data Augmentation.基于数据增强的构音障碍语音转换系统效率的提升。
IEEE Trans Neural Syst Rehabil Eng. 2023;31:4613-4623. doi: 10.1109/TNSRE.2023.3331524. Epub 2023 Nov 30.
6
STYLETTS-VC: ONE-SHOT VOICE CONVERSION BY KNOWLEDGE TRANSFER FROM STYLE-BASED TTS MODELS.STYLETTS-VC:基于风格的语音合成模型知识迁移实现的一次性语音转换
SLT Workshop Spok Lang Technol. 2023 Jan;2022:920-927. doi: 10.1109/slt54892.2023.10022498.
7
[Variability in the digital voice analysis depending on the analyzed vocal, in normal patients and in patients with dysphonia].[正常患者和发声障碍患者中,数字语音分析的变异性取决于所分析的嗓音]
Acta Otorrinolaringol Esp. 2000 Oct;51(7):618-28.
8
Data Augmentation for EEG-Based Emotion Recognition Using Generative Adversarial Networks.基于生成对抗网络的脑电图情感识别数据增强
Front Comput Neurosci. 2021 Dec 9;15:723843. doi: 10.3389/fncom.2021.723843. eCollection 2021.
9
Noise-robust voice conversion with domain adversarial training.基于域对抗训练的抗噪语音转换。
Neural Netw. 2022 Apr;148:74-84. doi: 10.1016/j.neunet.2022.01.003. Epub 2022 Jan 13.
10
Manipulating Voice Attributes by Adversarial Learning of Structured Disentangled Representations.通过结构化解缠表示的对抗学习来操纵语音属性。
Entropy (Basel). 2023 Feb 18;25(2):375. doi: 10.3390/e25020375.