Xie Xiaoping, Cai Hao, Li Can, Wu Yu, Ding Fei
The State Key Laboratory of Advanced Design and Manufacturing for Vehicle Body, Hunan University, Changsha, China; Shenzhen Research Institute of Hunan University, Shenzhen, China.
The State Key Laboratory of Advanced Design and Manufacturing for Vehicle Body, Hunan University, Changsha, China.
J Voice. 2023 Oct 25. doi: 10.1016/j.jvoice.2023.09.024.
The incidence rate of voice diseases is increasing year by year. The use of software for remote diagnosis is a technical development trend and has important practical value. Among voice diseases, common diseases that cause hoarseness include spasmodic dysphonia, vocal cord paralysis, vocal nodule, and vocal cord polyp. This paper presents a voice disease detection method that can be applied in a wide range of clinical. We cooperated with Xiangya Hospital of Central South University to collect voice samples from 352 different patients. The Mel Frequency Cepstrum Coefficient (MFCC) parameters are extracted as input features to describe the voice in the form of data. An innovative model combining MFCC parameters and single convolution layer CNN is proposed for fast calculation and classification. The highest accuracy we achieved was 92%, it is fully ahead of the original research results and internationally advanced. And we use advanced voice function assessment databases (AVFAD) to evaluate the generalization ability of the method we proposed, which achieved an accuracy rate of 98%. Experiments on clinical and standard datasets show that for the pathological detection of voice diseases, our method has greatly improved in accuracy and computational efficiency.
嗓音疾病的发病率逐年上升。利用软件进行远程诊断是技术发展趋势,具有重要的实用价值。在嗓音疾病中,导致声音嘶哑的常见疾病包括痉挛性发声障碍、声带麻痹、声带小结和声带息肉。本文提出了一种可广泛应用于临床的嗓音疾病检测方法。我们与中南大学湘雅医院合作,收集了352名不同患者的嗓音样本。提取梅尔频率倒谱系数(MFCC)参数作为输入特征,以数据形式描述嗓音。提出了一种将MFCC参数与单卷积层卷积神经网络相结合的创新模型,用于快速计算和分类。我们实现的最高准确率为92%,完全领先于原研究结果和国际先进水平。并且我们使用先进的嗓音功能评估数据库(AVFAD)来评估我们所提出方法的泛化能力,其准确率达到了98%。在临床和标准数据集上的实验表明,对于嗓音疾病的病理检测,我们的方法在准确率和计算效率方面都有了很大提高。