Aljarallah Nasser Ali, Dutta Ashit Kumar, Sait Abdul Rahaman Wahab
Department of Computer Science and Information Systems, College of Applied Sciences, AlMaarefa University, Ad Diriyah, Riyadh, 13713, Saudi Arabia.
Department of Computer Science and Information Systems, College of Applied Sciences, AlMaarefa University, Ad Diriyah, Riyadh, 13713, Saudi Arabia.
SLAS Technol. 2025 Jun;32:100261. doi: 10.1016/j.slast.2025.100261. Epub 2025 Mar 6.
Speech disorders affect an individual's ability to generate sounds or utilize the voice appropriately. Neurological, developmental, physical, and trauma may cause speech disorders. Speech impairments influence communication, social interaction, education, and quality of life. Successful intervention entails early and precise diagnosis to allow for prompt treatment of these conditions. However, clinical examinations by speech-language pathologists are time-consuming, subjective, and demand an automated speech disorder detection (SDD) model. Mel-spectrogram images present a visual representation of multiple speech disorders. By classifying Mel-Spectrogram, various speech disorders can be identified. In this study, the authors proposed an image classification-based automated SDD model to classify Mel-Spectrograms to identify multiple speech disorders. Initially, Wavelet Transform (WT) hybridization technique was employed to generate Mel-Spectrogram using the voice samples. A feature extraction approach was developed using an enhanced LEVIT transformer. Finally, the extracted features were classified using an ensemble learning (EL) approach, containing CatBoost and XGBoost as base learners, and Extremely Randomized Tree as a meta learner. To reduce the computational resources, the authors used quantization-aware training (QAT). They employed Shapley Additive Explanations (SHAP) values to offer model interpretability. The proposed model was generalized using Voice ICar fEDerico II (VOICED) and LANNA datasets. The exceptional accuracy of 99.1 with limited parameters of 8.2 million demonstrated the significance of the proposed approach. The proposed model enhances speech disorder classification and offers novel prospects for building accessible, accurate, and efficient diagnostic tools. Researchers may integrate multimodal data to increase the model's use across languages and dialects, refining the proposed model for real-time clinical and telehealth deployment.
言语障碍会影响个体发出声音或正确使用嗓音的能力。神经学、发育、身体和创伤等因素可能导致言语障碍。言语损伤会影响沟通、社交互动、教育及生活质量。成功的干预需要早期精确诊断,以便对这些病症进行及时治疗。然而,言语语言病理学家进行的临床检查耗时、主观,且需要一个自动言语障碍检测(SDD)模型。梅尔频谱图图像呈现了多种言语障碍的视觉表示。通过对梅尔频谱图进行分类,可以识别各种言语障碍。在本研究中,作者提出了一种基于图像分类的自动SDD模型,对梅尔频谱图进行分类以识别多种言语障碍。最初,采用小波变换(WT)杂交技术,利用语音样本生成梅尔频谱图。使用增强型LEVIT变压器开发了一种特征提取方法。最后,使用集成学习(EL)方法对提取的特征进行分类,该方法包含CatBoost和XGBoost作为基础学习器,以及极端随机树作为元学习器。为了减少计算资源,作者使用了量化感知训练(QAT)。他们采用夏普利值(SHAP)来提供模型可解释性。所提出的模型使用Voice ICar fEDerico II(VOICED)和LANNA数据集进行了泛化。在仅有820万个参数的情况下,实现了99.1%的卓越准确率,证明了所提出方法的重要性。所提出的模型增强了言语障碍分类,并为构建可访问、准确且高效的诊断工具提供了新的前景。研究人员可以整合多模态数据,以增加模型在跨语言和方言中的应用,完善所提出的模型以用于实时临床和远程医疗部署。