• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

E-DGAN:一种基于编解码器生成对抗网络的病理语音到正常语音转换方法。

E-DGAN: An Encoder-Decoder Generative Adversarial Network Based Method for Pathological to Normal Voice Conversion.

出版信息

IEEE J Biomed Health Inform. 2023 May;27(5):2489-2500. doi: 10.1109/JBHI.2023.3239551. Epub 2023 May 4.

DOI:10.1109/JBHI.2023.3239551
PMID:37022002
Abstract

In recent years, more and more people suffer from voice-related diseases. Given the limitations of current pathological speech conversion methods, that is, a method can only convert a single kind of pathological voice. In this study, we propose a novel Encoder-Decoder Generative Adversarial Network (E-DGAN) to generate personalized speech for pathological to normal voice conversion, which is suitable for multiple kinds of pathological voices. Our proposed method can also solve the problem of improving the intelligibility and personalizing custom speech of pathological voices. Feature extraction is performed using a mel filter bank. The conversion network is an encoder-decoder structure, which is used to convert the mel spectrogram of pathological voices to the mel spectrogram of normal voices. After being converted by the residual conversion network, the personalized normal speech is synthesized by the neural vocoder. In addition, we propose a subjective evaluation metric named "content similarity" to evaluate the consistency between the converted pathological voice content and the reference content. The Saarbrücken Voice Database (SVD) is used to verify the proposed method. The intelligibility and content similarity of pathological voices are increased by 18.67% and 2.60%, respectively. Besides, an intuitive analysis based on a spectrogram was done and a significant improvement was achieved. The results show that our proposed method can improve the intelligibility of pathological voices and personalize the conversion of pathological voices into the normal voices of 20 different speakers. Our proposed method is compared with five other pathological voice conversion methods, and our proposed method has the best evaluation results.

摘要

近年来,越来越多的人患有与声音相关的疾病。鉴于当前病理语音转换方法的局限性,即一种方法只能转换单一类型的病理语音。在本研究中,我们提出了一种新颖的编码器-解码器生成对抗网络(E-DGAN),用于将病理语音转换为正常语音,适用于多种病理语音。我们提出的方法还可以解决提高病理语音可懂度和个性化定制语音的问题。特征提取使用梅尔滤波器组进行。转换网络是一个编码器-解码器结构,用于将病理语音的梅尔频谱图转换为正常语音的梅尔频谱图。经过残差转换网络的转换后,由神经声码器合成个性化的正常语音。此外,我们提出了一种名为“内容相似性”的主观评估指标,用于评估转换后的病理语音内容与参考内容的一致性。我们使用 Saarbrücken 语音数据库(SVD)来验证所提出的方法。病理语音的可懂度和内容相似度分别提高了 18.67%和 2.60%。此外,还进行了基于频谱图的直观分析,并取得了显著的改进。结果表明,我们提出的方法可以提高病理语音的可懂度,并将病理语音个性化转换为 20 个不同说话者的正常语音。我们提出的方法与其他五种病理语音转换方法进行了比较,我们提出的方法具有最好的评估结果。

相似文献

1
E-DGAN: An Encoder-Decoder Generative Adversarial Network Based Method for Pathological to Normal Voice Conversion.E-DGAN:一种基于编解码器生成对抗网络的病理语音到正常语音转换方法。
IEEE J Biomed Health Inform. 2023 May;27(5):2489-2500. doi: 10.1109/JBHI.2023.3239551. Epub 2023 May 4.
2
A Multidomain Generative Adversarial Network for Hoarse-to-Normal Voice Conversion.用于嘶哑到正常语音转换的多域生成对抗网络。
J Voice. 2023 Oct 14. doi: 10.1016/j.jvoice.2023.08.027.
3
Multiple Vowels Repair Based on Pitch Extraction and Line Spectrum Pair Feature for Voice Disorder.基于基频提取和线谱对特征的多重元音修复用于语音障碍。
IEEE J Biomed Health Inform. 2020 Jul;24(7):1940-1951. doi: 10.1109/JBHI.2020.2978103. Epub 2020 Mar 3.
4
Noise-robust voice conversion with domain adversarial training.基于域对抗训练的抗噪语音转换。
Neural Netw. 2022 Apr;148:74-84. doi: 10.1016/j.neunet.2022.01.003. Epub 2022 Jan 13.
5
Robustness of auditory Teager Energy Cepstrum Coefficients for classification of pathological and normal voices in noisy environments.听觉Teager能量倒谱系数在噪声环境中对病理性嗓音和正常嗓音分类的稳健性
ScientificWorldJournal. 2013 May 28;2013:435729. doi: 10.1155/2013/435729. Print 2013.
6
Voice pathology detection and classification from speech signals and EGG signals based on a multimodal fusion method.基于多模态融合方法的语音信号和 EEG 信号的语音病理学检测与分类。
Biomed Tech (Berl). 2021 Nov 29;66(6):613-625. doi: 10.1515/bmt-2021-0112. Print 2021 Dec 20.
7
The effect of visible speech in the perceptual rating of pathological voices.可见言语对病理性嗓音感知评分的影响。
Arch Otolaryngol Head Neck Surg. 2007 Feb;133(2):178-85. doi: 10.1001/archotol.133.2.178.
8
Phonetic posteriorgram-based voice conversion system to improve speech intelligibility of dysarthric patients.基于语音后图的语音转换系统,提高构音障碍患者的言语可懂度。
Comput Methods Programs Biomed. 2022 Mar;215:106602. doi: 10.1016/j.cmpb.2021.106602. Epub 2021 Dec 26.
9
Support vector wavelet adaptation for pathological voice assessment.支持向量小波自适应用于病理嗓音评估。
Comput Biol Med. 2011 Sep;41(9):822-8. doi: 10.1016/j.compbiomed.2011.06.019. Epub 2011 Jul 20.
10
Joint Dictionary Learning-Based Non-Negative Matrix Factorization for Voice Conversion to Improve Speech Intelligibility After Oral Surgery.基于联合字典学习的非负矩阵分解用于口腔手术后语音转换以提高语音清晰度
IEEE Trans Biomed Eng. 2017 Nov;64(11):2584-2594. doi: 10.1109/TBME.2016.2644258.