Suppr超能文献

基于基频提取和线谱对特征的多重元音修复用于语音障碍。

Multiple Vowels Repair Based on Pitch Extraction and Line Spectrum Pair Feature for Voice Disorder.

出版信息

IEEE J Biomed Health Inform. 2020 Jul;24(7):1940-1951. doi: 10.1109/JBHI.2020.2978103. Epub 2020 Mar 3.

Abstract

Individuals, such as voice-related professionals, elderly people and smokers, are increasingly suffering from voice disorder, which implies the importance of pathological voice repair. Previous work on pathological voice repair only concerned about sustained vowel /a/, but multiple vowels repair is still challenging due to the unstable extraction of pitch and the unsatisfactory reconstruction of formant. In this paper, a multiple vowels repair based on pitch extraction and Line Spectrum Pair feature for voice disorder is proposed, which broadened the research subjects of voice repair from only single vowel /a/ to multiple vowels /a/, /i/ and /u/ and achieved the repair of these vowels successfully. Considering deep neural network as a classifier, a voice recognition is performed to classify the normal and pathological voices. Wavelet Transform and Hilbert-Huang Transform are applied for pitch extraction. Based on Line Spectrum Pair (LSP) feature, the formant is reconstructed. The final repaired voice is obtained by synthesizing the pitch and the formant. The proposed method is validated on Saarbrücken Voice Database (SVD) database. The achieved improvements of three metrics, Segmental Signal-to-Noise Ratio, LSP distance measure and Mel cepstral distance measure, are respectively 45.87%, 50.37% and 15.56%. Besides, an intuitive analysis based on spectrogram has been done and a prominent repair effect has been achieved.

摘要

个体,如与声音相关的专业人员、老年人和吸烟者,越来越多地遭受声音障碍,这意味着病理声音修复的重要性。先前的病理声音修复工作仅关注于持续元音 /a/,但由于基频提取不稳定和共振峰重建不理想,多元音修复仍然具有挑战性。在本文中,提出了一种基于基频提取和线谱对特征的用于语音障碍的多元音修复方法,将语音修复的研究对象从单一元音 /a/扩展到了多元音 /a/、/i/和 /u/,并成功地实现了这些元音的修复。考虑到深度神经网络作为分类器,进行语音识别以对正常语音和病理语音进行分类。应用小波变换和希尔伯特-黄变换进行基频提取。基于线谱对(LSP)特征,重建共振峰。通过合成基频和共振峰,得到最终修复的语音。所提出的方法在 Saarbrücken 语音数据库(SVD)上进行了验证。三个指标的改进,即分段信噪比、LSP 距离度量和梅尔倒谱距离度量,分别为 45.87%、50.37%和 15.56%。此外,还进行了基于语谱图的直观分析,并取得了显著的修复效果。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验