Kuo Yao-Ming, Ruan Shanq-Jang, Chen Yu-Chin, Tu Ya-Wen
Department of Electronic and Computer Engineering, National Taiwan University of Science and Technology, Taipei 106, Taiwan.
Sijhih Cathay General Hospital, New Taipei 221, Taiwan.
Children (Basel). 2022 Jul 1;9(7):996. doi: 10.3390/children9070996.
This article describes a system for analyzing acoustic data to assist in the diagnosis and classification of children's speech sound disorders (SSDs) using a computer. The analysis concentrated on identifying and categorizing four distinct types of Chinese SSDs. The study collected and generated a speech corpus containing 2540 stopping, backing, final consonant deletion process (FCDP), and affrication samples from 90 children aged 3-6 years with normal or pathological articulatory features. Each recording was accompanied by a detailed diagnostic annotation by two speech-language pathologists (SLPs). Classification of the speech samples was accomplished using three well-established neural network models for image classification. The feature maps were created using three sets of MFCC (Mel-frequency cepstral coefficients) parameters extracted from speech sounds and aggregated into a three-dimensional data structure as model input. We employed six techniques for data augmentation to augment the available dataset while avoiding overfitting. The experiments examine the usability of four different categories of Chinese phrases and characters. Experiments with different data subsets demonstrate the system's ability to accurately detect the analyzed pronunciation disorders. The best multi-class classification using a single Chinese phrase achieves an accuracy of 74.4 percent.
本文介绍了一种利用计算机分析声学数据以辅助诊断和分类儿童语音障碍(SSD)的系统。该分析集中于识别和分类四种不同类型的中文SSD。研究收集并生成了一个语音语料库,其中包含来自90名3至6岁具有正常或病理发音特征儿童的2540个塞音、后缩音、韵尾辅音缺失过程(FCDP)和塞擦音样本。每次录音都伴有两名言语治疗师(SLP)的详细诊断注释。语音样本的分类使用了三种成熟的用于图像分类的神经网络模型。特征图是使用从语音中提取的三组梅尔频率倒谱系数(MFCC)参数创建的,并聚合为三维数据结构作为模型输入。我们采用了六种数据增强技术来扩充可用数据集,同时避免过拟合。实验检验了四类不同中文短语和汉字的可用性。对不同数据子集的实验证明了该系统准确检测所分析发音障碍的能力。使用单个中文短语的最佳多类分类准确率达到74.4%。