Ma Shuaichi, Liao Wenwen, Zhang Yi, Zhang Fan, Wang Yimiao, Lu Zhiyan, Zhao Chen, Yu Jianbo, He Peijie
School of Health Science and Engineering, University of Shanghai for Science and Technology, Shanghai, 200093, China.
ENT Institute and Department of Otorhinolaryngology, Eye & ENT Hospital, Fudan University, 83 Fenyang Road, Shanghai, 200031, People's Republic of China.
Biomed Eng Online. 2025 Jun 21;24(1):76. doi: 10.1186/s12938-025-01401-9.
This study aims to develop an AI-powered platform using Mel-spectrogram analysis and convolutional neural networks (CNN) to automate the severity assessment of unilateral vocal fold paralysis (UVCP) through voice analysis, providing an objective basis for individualized clinical treatment plans.
To accurately identify the severity of UVCP, this study developed the CNN model TripleConvNet. Voice samples were collected from 131 healthy individuals and 292 confirmed UVCP patients from the Eye and ENT Hospital of Fudan University. Based on vocal fold compensation function, the patients were divided into three groups: decompensated (84 cases), partially compensated (98 cases), and fully compensated (110 cases). Using Mel-spectrograms and their first- and second-order differential features as inputs, the TripleConvNet model classified patients by severity and was systematically evaluated for its performance in UVCP severity grading tasks.
TripleConvNet achieved a classification accuracy of 74.3% in distinguishing between healthy voices and the UVCP decompensated, partially compensated, and fully compensated groups.
This study demonstrates the potential of deep learning-based non-invasive voice analysis for precise grading of UVCP severity. The proposed method offers a promising clinical tool to assist physicians in disease assessment and personalized treatment planning.
本研究旨在开发一个基于人工智能的平台,利用梅尔频谱图分析和卷积神经网络(CNN),通过语音分析实现单侧声带麻痹(UVCP)严重程度评估的自动化,为个性化临床治疗方案提供客观依据。
为准确识别UVCP的严重程度,本研究开发了CNN模型TripleConvNet。从复旦大学附属眼耳鼻喉科医院收集了131名健康个体和292名确诊的UVCP患者的语音样本。根据声带代偿功能,将患者分为三组:失代偿组(84例)、部分代偿组(98例)和完全代偿组(110例)。以梅尔频谱图及其一阶和二阶微分特征作为输入,TripleConvNet模型按严重程度对患者进行分类,并对其在UVCP严重程度分级任务中的性能进行系统评估。
TripleConvNet在区分健康语音与UVCP失代偿组、部分代偿组和完全代偿组方面的分类准确率达到74.3%。
本研究证明了基于深度学习的无创语音分析在精确分级UVCP严重程度方面的潜力。所提出的方法为协助医生进行疾病评估和个性化治疗规划提供了一种有前景的临床工具。