• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用深度学习技术的图像分类驱动的言语障碍检测

Image classification-driven speech disorder detection using deep learning technique.

作者信息

Aljarallah Nasser Ali, Dutta Ashit Kumar, Sait Abdul Rahaman Wahab

机构信息

Department of Computer Science and Information Systems, College of Applied Sciences, AlMaarefa University, Ad Diriyah, Riyadh, 13713, Saudi Arabia.

Department of Computer Science and Information Systems, College of Applied Sciences, AlMaarefa University, Ad Diriyah, Riyadh, 13713, Saudi Arabia.

出版信息

SLAS Technol. 2025 Jun;32:100261. doi: 10.1016/j.slast.2025.100261. Epub 2025 Mar 6.

DOI:10.1016/j.slast.2025.100261
PMID:40057233
Abstract

Speech disorders affect an individual's ability to generate sounds or utilize the voice appropriately. Neurological, developmental, physical, and trauma may cause speech disorders. Speech impairments influence communication, social interaction, education, and quality of life. Successful intervention entails early and precise diagnosis to allow for prompt treatment of these conditions. However, clinical examinations by speech-language pathologists are time-consuming, subjective, and demand an automated speech disorder detection (SDD) model. Mel-spectrogram images present a visual representation of multiple speech disorders. By classifying Mel-Spectrogram, various speech disorders can be identified. In this study, the authors proposed an image classification-based automated SDD model to classify Mel-Spectrograms to identify multiple speech disorders. Initially, Wavelet Transform (WT) hybridization technique was employed to generate Mel-Spectrogram using the voice samples. A feature extraction approach was developed using an enhanced LEVIT transformer. Finally, the extracted features were classified using an ensemble learning (EL) approach, containing CatBoost and XGBoost as base learners, and Extremely Randomized Tree as a meta learner. To reduce the computational resources, the authors used quantization-aware training (QAT). They employed Shapley Additive Explanations (SHAP) values to offer model interpretability. The proposed model was generalized using Voice ICar fEDerico II (VOICED) and LANNA datasets. The exceptional accuracy of 99.1 with limited parameters of 8.2 million demonstrated the significance of the proposed approach. The proposed model enhances speech disorder classification and offers novel prospects for building accessible, accurate, and efficient diagnostic tools. Researchers may integrate multimodal data to increase the model's use across languages and dialects, refining the proposed model for real-time clinical and telehealth deployment.

摘要

言语障碍会影响个体发出声音或正确使用嗓音的能力。神经学、发育、身体和创伤等因素可能导致言语障碍。言语损伤会影响沟通、社交互动、教育及生活质量。成功的干预需要早期精确诊断,以便对这些病症进行及时治疗。然而,言语语言病理学家进行的临床检查耗时、主观,且需要一个自动言语障碍检测(SDD)模型。梅尔频谱图图像呈现了多种言语障碍的视觉表示。通过对梅尔频谱图进行分类,可以识别各种言语障碍。在本研究中,作者提出了一种基于图像分类的自动SDD模型,对梅尔频谱图进行分类以识别多种言语障碍。最初,采用小波变换(WT)杂交技术,利用语音样本生成梅尔频谱图。使用增强型LEVIT变压器开发了一种特征提取方法。最后,使用集成学习(EL)方法对提取的特征进行分类,该方法包含CatBoost和XGBoost作为基础学习器,以及极端随机树作为元学习器。为了减少计算资源,作者使用了量化感知训练(QAT)。他们采用夏普利值(SHAP)来提供模型可解释性。所提出的模型使用Voice ICar fEDerico II(VOICED)和LANNA数据集进行了泛化。在仅有820万个参数的情况下,实现了99.1%的卓越准确率,证明了所提出方法的重要性。所提出的模型增强了言语障碍分类,并为构建可访问、准确且高效的诊断工具提供了新的前景。研究人员可以整合多模态数据,以增加模型在跨语言和方言中的应用,完善所提出的模型以用于实时临床和远程医疗部署。

相似文献

1
Image classification-driven speech disorder detection using deep learning technique.使用深度学习技术的图像分类驱动的言语障碍检测
SLAS Technol. 2025 Jun;32:100261. doi: 10.1016/j.slast.2025.100261. Epub 2025 Mar 6.
2
Multimodal depression detection based on an attention graph convolution and transformer.基于注意力图卷积和变换器的多模态抑郁症检测
Math Biosci Eng. 2025 Feb 27;22(3):652-676. doi: 10.3934/mbe.2025024.
3
Brain tumor segmentation and detection in MRI using convolutional neural networks and VGG16.使用卷积神经网络和VGG16在磁共振成像(MRI)中进行脑肿瘤分割与检测
Cancer Biomark. 2025 Mar;42(3):18758592241311184. doi: 10.1177/18758592241311184. Epub 2025 Apr 4.
4
Deep learning in automatic detection of dysphonia: Comparing acoustic features and developing a generalizable framework.深度学习在嗓音障碍自动检测中的应用:比较声学特征并开发一个可推广的框架。
Int J Lang Commun Disord. 2023 Mar;58(2):279-294. doi: 10.1111/1460-6984.12783. Epub 2022 Sep 18.
5
TranStutter: A Convolution-Free Transformer-Based Deep Learning Method to Classify Stuttered Speech Using 2D Mel-Spectrogram Visualization and Attention-Based Feature Representation.TransStutter:一种基于卷积的深度学习方法,使用 2D Mel 频谱图可视化和基于注意力的特征表示来分类口吃语音。
Sensors (Basel). 2023 Sep 22;23(19):8033. doi: 10.3390/s23198033.
6
Lightweight hybrid transformers-based dyslexia detection using cross-modality data.基于轻量级混合变压器的跨模态数据诵读困难检测
Sci Rep. 2025 May 16;15(1):17054. doi: 10.1038/s41598-025-01235-4.
7
ResViT FusionNet Model: An explainable AI-driven approach for automated grading of diabetic retinopathy in retinal images.ResViT融合网络模型:一种用于视网膜图像中糖尿病视网膜病变自动分级的可解释人工智能驱动方法。
Comput Biol Med. 2025 Mar;186:109656. doi: 10.1016/j.compbiomed.2025.109656. Epub 2025 Jan 16.
8
Deep learning-based classification of speech disorder in stroke and hearing impairment.基于深度学习的中风和听力障碍语音障碍分类
PLoS One. 2025 May 28;20(5):e0315286. doi: 10.1371/journal.pone.0315286. eCollection 2025.
9
An enhanced speech emotion recognition using vision transformer.基于视觉转换器的增强型语音情感识别。
Sci Rep. 2024 Jun 7;14(1):13126. doi: 10.1038/s41598-024-63776-4.
10
Improving Breast Cancer Diagnosis in Ultrasound Images Using Deep Learning with Feature Fusion and Attention Mechanism.基于特征融合与注意力机制的深度学习用于改善超声图像中的乳腺癌诊断
Acad Radiol. 2025 May 27. doi: 10.1016/j.acra.2025.05.007.

引用本文的文献

1
Machine Learning-Based Identification of Phonological Biomarkers for Speech Sound Disorders in Saudi Arabic-Speaking Children.基于机器学习识别沙特阿拉伯语儿童语音障碍的语音生物标志物
Diagnostics (Basel). 2025 May 31;15(11):1401. doi: 10.3390/diagnostics15111401.