基于深度学习的汉语语音障碍自动分类

Deep-Learning-Based Automated Classification of Chinese Speech Sound Disorders.

作者信息

Kuo Yao-Ming, Ruan Shanq-Jang, Chen Yu-Chin, Tu Ya-Wen

机构信息

Department of Electronic and Computer Engineering, National Taiwan University of Science and Technology, Taipei 106, Taiwan.

Sijhih Cathay General Hospital, New Taipei 221, Taiwan.

出版信息

Children (Basel). 2022 Jul 1;9(7):996. doi: 10.3390/children9070996.

DOI:10.3390/children9070996

PMID:35883979

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9324778/

Abstract

This article describes a system for analyzing acoustic data to assist in the diagnosis and classification of children's speech sound disorders (SSDs) using a computer. The analysis concentrated on identifying and categorizing four distinct types of Chinese SSDs. The study collected and generated a speech corpus containing 2540 stopping, backing, final consonant deletion process (FCDP), and affrication samples from 90 children aged 3-6 years with normal or pathological articulatory features. Each recording was accompanied by a detailed diagnostic annotation by two speech-language pathologists (SLPs). Classification of the speech samples was accomplished using three well-established neural network models for image classification. The feature maps were created using three sets of MFCC (Mel-frequency cepstral coefficients) parameters extracted from speech sounds and aggregated into a three-dimensional data structure as model input. We employed six techniques for data augmentation to augment the available dataset while avoiding overfitting. The experiments examine the usability of four different categories of Chinese phrases and characters. Experiments with different data subsets demonstrate the system's ability to accurately detect the analyzed pronunciation disorders. The best multi-class classification using a single Chinese phrase achieves an accuracy of 74.4 percent.

摘要

本文介绍了一种利用计算机分析声学数据以辅助诊断和分类儿童语音障碍（SSD）的系统。该分析集中于识别和分类四种不同类型的中文SSD。研究收集并生成了一个语音语料库，其中包含来自90名3至6岁具有正常或病理发音特征儿童的2540个塞音、后缩音、韵尾辅音缺失过程（FCDP）和塞擦音样本。每次录音都伴有两名言语治疗师（SLP）的详细诊断注释。语音样本的分类使用了三种成熟的用于图像分类的神经网络模型。特征图是使用从语音中提取的三组梅尔频率倒谱系数（MFCC）参数创建的，并聚合为三维数据结构作为模型输入。我们采用了六种数据增强技术来扩充可用数据集，同时避免过拟合。实验检验了四类不同中文短语和汉字的可用性。对不同数据子集的实验证明了该系统准确检测所分析发音障碍的能力。使用单个中文短语的最佳多类分类准确率达到74.4%。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eac7/9324778/4e76cbc9bf7c/children-09-00996-g006.jpg

相似文献

Deep-Learning-Based Automated Classification of Chinese Speech Sound Disorders.基于深度学习的汉语语音障碍自动分类

Children (Basel). 2022 Jul 1;9(7):996. doi: 10.3390/children9070996.

Comparative study of respiratory sounds classification methods based on cepstral analysis and artificial neural networks.基于倒谱分析和人工神经网络的呼吸音分类方法的比较研究。

Comput Biol Med. 2024 Mar;171:108190. doi: 10.1016/j.compbiomed.2024.108190. Epub 2024 Feb 20.

Deep learning in automatic detection of dysphonia: Comparing acoustic features and developing a generalizable framework.深度学习在嗓音障碍自动检测中的应用：比较声学特征并开发一个可推广的框架。

Int J Lang Commun Disord. 2023 Mar;58(2):279-294. doi: 10.1111/1460-6984.12783. Epub 2022 Sep 18.

Sound as a bell: a deep learning approach for health status classification through speech acoustic biomarkers.健康如钟：一种通过语音声学生物标志物进行健康状态分类的深度学习方法。

Chin Med. 2024 Jul 24;19(1):101. doi: 10.1186/s13020-024-00973-3.

Automated Dysarthria Severity Classification: A Study on Acoustic Features and Deep Learning Techniques.自动构音障碍严重程度分类：声学特征与深度学习技术研究。

IEEE Trans Neural Syst Rehabil Eng. 2022;30:1147-1157. doi: 10.1109/TNSRE.2022.3169814. Epub 2022 May 4.

Heart sound classification based on improved MFCC features and convolutional recurrent neural networks.基于改进 MFCC 特征和卷积循环神经网络的心音分类。

Neural Netw. 2020 Oct;130:22-32. doi: 10.1016/j.neunet.2020.06.015. Epub 2020 Jun 23.

Ensemble learning with speaker embeddings in multiple speech task stimuli for depression detection.在用于抑郁症检测的多语音任务刺激中结合说话人嵌入的集成学习。

Front Neurosci. 2023 Mar 23;17:1141621. doi: 10.3389/fnins.2023.1141621. eCollection 2023.

Automatic COVID-19 disease diagnosis using 1D convolutional neural network and augmentation with human respiratory sound based on parameters: cough, breath, and voice.使用一维卷积神经网络基于咳嗽、呼吸和声音等参数对新冠肺炎进行自动诊断，并结合人类呼吸声进行增强。

AIMS Public Health. 2021 Mar 10;8(2):240-264. doi: 10.3934/publichealth.2021019. eCollection 2021.

Considerations and Challenges for Real-World Deployment of an Acoustic-Based COVID-19 Screening System.考虑基于声学的 COVID-19 筛查系统在实际部署中的注意事项和挑战。

Sensors (Basel). 2022 Dec 6;22(23):9530. doi: 10.3390/s22239530.

Heart sound classification based on improved mel-frequency spectral coefficients and deep residual learning.基于改进的梅尔频率谱系数和深度残差学习的心音分类

Front Physiol. 2022 Dec 22;13:1084420. doi: 10.3389/fphys.2022.1084420. eCollection 2022.

引用本文的文献

Machine Learning-Based Identification of Phonological Biomarkers for Speech Sound Disorders in Saudi Arabic-Speaking Children.基于机器学习识别沙特阿拉伯语儿童语音障碍的语音生物标志物

Diagnostics (Basel). 2025 May 31;15(11):1401. doi: 10.3390/diagnostics15111401.

本文引用的文献

Prevalence and Predictors of Persistent Speech Sound Disorder at Eight Years Old: Findings From a Population Cohort Study.8岁儿童持续性语音障碍的患病率及预测因素：一项队列研究的结果

J Speech Lang Hear Res. 2016 Aug 1;59(4):647-73. doi: 10.1044/2015_JSLHR-S-14-0282.

Speech sound disorder at 4 years: prevalence, comorbidities, and predictors in a community cohort of children.4岁儿童的语音障碍：社区儿童队列中的患病率、合并症及预测因素

Dev Med Child Neurol. 2015 Jun;57(6):578-84. doi: 10.1111/dmcn.12635. Epub 2014 Nov 18.

Phonological processing and reading in children with speech sound disorders.语音障碍儿童的语音加工与阅读

Am J Speech Lang Pathol. 2007 Aug;16(3):260-70. doi: 10.1044/1058-0360(2007/030).

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于深度学习的汉语语音障碍自动分类

Deep-Learning-Based Automated Classification of Chinese Speech Sound Disorders.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献