Suppr超能文献

基于深度学习的汉语语音障碍自动分类

Deep-Learning-Based Automated Classification of Chinese Speech Sound Disorders.

作者信息

Kuo Yao-Ming, Ruan Shanq-Jang, Chen Yu-Chin, Tu Ya-Wen

机构信息

Department of Electronic and Computer Engineering, National Taiwan University of Science and Technology, Taipei 106, Taiwan.

Sijhih Cathay General Hospital, New Taipei 221, Taiwan.

出版信息

Children (Basel). 2022 Jul 1;9(7):996. doi: 10.3390/children9070996.

Abstract

This article describes a system for analyzing acoustic data to assist in the diagnosis and classification of children's speech sound disorders (SSDs) using a computer. The analysis concentrated on identifying and categorizing four distinct types of Chinese SSDs. The study collected and generated a speech corpus containing 2540 stopping, backing, final consonant deletion process (FCDP), and affrication samples from 90 children aged 3-6 years with normal or pathological articulatory features. Each recording was accompanied by a detailed diagnostic annotation by two speech-language pathologists (SLPs). Classification of the speech samples was accomplished using three well-established neural network models for image classification. The feature maps were created using three sets of MFCC (Mel-frequency cepstral coefficients) parameters extracted from speech sounds and aggregated into a three-dimensional data structure as model input. We employed six techniques for data augmentation to augment the available dataset while avoiding overfitting. The experiments examine the usability of four different categories of Chinese phrases and characters. Experiments with different data subsets demonstrate the system's ability to accurately detect the analyzed pronunciation disorders. The best multi-class classification using a single Chinese phrase achieves an accuracy of 74.4 percent.

摘要

本文介绍了一种利用计算机分析声学数据以辅助诊断和分类儿童语音障碍(SSD)的系统。该分析集中于识别和分类四种不同类型的中文SSD。研究收集并生成了一个语音语料库,其中包含来自90名3至6岁具有正常或病理发音特征儿童的2540个塞音、后缩音、韵尾辅音缺失过程(FCDP)和塞擦音样本。每次录音都伴有两名言语治疗师(SLP)的详细诊断注释。语音样本的分类使用了三种成熟的用于图像分类的神经网络模型。特征图是使用从语音中提取的三组梅尔频率倒谱系数(MFCC)参数创建的,并聚合为三维数据结构作为模型输入。我们采用了六种数据增强技术来扩充可用数据集,同时避免过拟合。实验检验了四类不同中文短语和汉字的可用性。对不同数据子集的实验证明了该系统准确检测所分析发音障碍的能力。使用单个中文短语的最佳多类分类准确率达到74.4%。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eac7/9324778/4e76cbc9bf7c/children-09-00996-g006.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验