Suppr超能文献

用于诊断韩国儿童语音障碍发音的自动语音识别(ASR)

Automatic speech recognition (ASR) for the diagnosis of pronunciation of speech sound disorders in Korean children.

作者信息

Ahn Taekyung, Hong Yeonjung, Im Younggon, Kim Do Hyung, Kang Dayoung, Jeong Joo Won, Kim Jae Won, Kim Min Jung, Cho Ah-Ra, Nam Hosung, Jang Dae-Hyun

机构信息

Department of English Language and Literature, Korea University, Seoul, Republic of Korea.

AI R&D Group, MediaZen, Seongnam-si, Republic of Korea.

出版信息

Clin Linguist Phon. 2024 Aug 20:1-14. doi: 10.1080/02699206.2024.2387609.

Abstract

This study presents a model of automatic speech recognition (ASR) that is designed to diagnose pronunciation issues in children with speech sound disorders (SSDs) to replace manual transcriptions in clinical procedures. Because ASR models trained for general purposes mainly predict input speech into standard spelling words, well-known high-performance ASR models are not suitable for evaluating pronunciation in children with SSDs. We fine-tuned the wav2vec2.0 XLS-R model to recognise words as they are pronounced by children, rather than converting the speech into their standard spelling words. The model was fine-tuned with a speech dataset of 137 children with SSDs pronouncing 73 Korean words that are selected for actual clinical diagnosis. The model's Phoneme Error Rate (PER) was only 10% when its predictions of children's pronunciations were compared to human annotations of pronunciations as heard. In contrast, despite its robust performance on general tasks, the state-of-the-art ASR model Whisper showed limitations in recognising the speech of children with SSDs, with a PER of approximately 50%. While the model still requires improvement in terms of the recognition of unclear pronunciation, this study demonstrates that ASR models can streamline complex pronunciation error diagnostic procedures in clinical fields.

摘要

本研究提出了一种自动语音识别(ASR)模型,该模型旨在诊断语音障碍(SSD)儿童的发音问题,以取代临床程序中的人工转录。由于通用训练的ASR模型主要将输入语音预测为标准拼写单词,因此著名的高性能ASR模型不适用于评估SSD儿童的发音。我们对wav2vec2.0 XLS-R模型进行了微调,以识别儿童的发音单词,而不是将语音转换为标准拼写单词。该模型使用137名SSD儿童的语音数据集进行微调,这些儿童发音了73个用于实际临床诊断的韩语单词。当将该模型对儿童发音的预测与听到的发音的人工标注进行比较时,其音素错误率(PER)仅为10%。相比之下,尽管最先进的ASR模型Whisper在一般任务上表现强劲,但在识别SSD儿童的语音方面存在局限性,PER约为50%。虽然该模型在识别不清晰发音方面仍需改进,但本研究表明,ASR模型可以简化临床领域复杂的发音错误诊断程序。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验