Chu Minghang, Wang Jing, Fan Zhiwei, Yang Mengtao, Xu Chao, Ma Yaoyao, Tao Zhi, Wu Di
School of Optoelectronic Science and Engineering, Soochow University, Suzhou, Jiangsu, China.
School of Optoelectronic Science and Engineering, Soochow University, Suzhou, Jiangsu, China.
J Voice. 2023 Oct 14. doi: 10.1016/j.jvoice.2023.08.027.
Hoarse voice affects the efficiency of communication between people. However, surgical treatment may result in patients with poorer voice quality, and voice repair techniques can only repair vowels. In this paper, we propose a novel multidomain generative adversarial voice conversion method to achieve hoarse-to-normal voice conversion and personalize voices for patients with hoarseness. The proposed method aims to improve the speech quality of hoarse voices through a multidomain generative adversarial network. The proposed method is evaluated on subjective and objective evaluation metrics. According to the findings of the spectrum analysis, the suggested method converts hoarse voice formants more effectively than variational auto-encoder (VAE), Auto-VC (voice conversion), StarGAN-VC (Generative Adversarial Network- Voice Conversion), and CycleVAE. For the word error rate, the suggested method obtains absolute gains of 35.62, 37.97, 45.42, and 50.05 compared to CycleVAE, StarGAN-VC, Auto-VC, and VAE, respectively. The suggested method achieves CycleVAE, VAE, StarGAN-VC, and Auto-VC, respectively, in terms of naturalness by 42.49%, 51.60%, 69.37%, and 77.54%. The suggested method outperforms VAE, CycleVAE, StarGAN-VC, and Auto-VC, respectively, in terms of intelligibility, with absolute gains of 0.87, 0.93, 1.08, and 1.13. In terms of content similarity, the proposed method obtains 43.48%, 75.52%, 76.21%, and 108.62% improvements compared to CycleVAE, StarGAN-VC, Auto-VC, and VAE, respectively. ABX results show that the suggested method can personalize the voice for patients with hoarseness. This study demonstrates the feasibility of voice conversion methods in improving the speech quality of hoarse voices.
嗓音嘶哑会影响人与人之间的沟通效率。然而,手术治疗可能会导致患者的嗓音质量更差,并且语音修复技术只能修复元音。在本文中,我们提出了一种新颖的多域生成对抗语音转换方法,以实现嘶哑嗓音到正常嗓音的转换,并为嗓音嘶哑的患者实现个性化嗓音。所提出的方法旨在通过多域生成对抗网络提高嘶哑嗓音的语音质量。所提出的方法在主观和客观评估指标上进行了评估。根据频谱分析的结果,所建议的方法比变分自编码器(VAE)、自动语音转换(Auto-VC)、星型生成对抗网络语音转换(StarGAN-VC)和循环变分自编码器(CycleVAE)更有效地转换嘶哑嗓音共振峰。对于单词错误率,所建议的方法与CycleVAE、StarGAN-VC、Auto-VC和VAE相比,分别获得了35.62、37.97、45.42和50.05的绝对增益。所建议的方法在自然度方面分别比CycleVAE、VAE、StarGAN-VC和Auto-VC提高了42.49%、51.60%、69.37%和77.54%。所建议的方法在可懂度方面分别比VAE、CycleVAE、StarGAN-VC和Auto-VC表现更优,绝对增益分别为0.87、0.93、1.08和1.13。在内容相似度方面,所提出的方法与CycleVAE、StarGAN-VC、Auto-VC和VAE相比,分别提高了43.48%、75.52%、76.21%和108.62%。ABX结果表明,所建议的方法可以为嗓音嘶哑的患者实现个性化嗓音。本研究证明了语音转换方法在提高嘶哑嗓音语音质量方面的可行性。