• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于诊断韩国儿童语音障碍发音的自动语音识别(ASR)

Automatic speech recognition (ASR) for the diagnosis of pronunciation of speech sound disorders in Korean children.

作者信息

Ahn Taekyung, Hong Yeonjung, Im Younggon, Kim Do Hyung, Kang Dayoung, Jeong Joo Won, Kim Jae Won, Kim Min Jung, Cho Ah-Ra, Nam Hosung, Jang Dae-Hyun

机构信息

Department of English Language and Literature, Korea University, Seoul, Republic of Korea.

AI R&D Group, MediaZen, Seongnam-si, Republic of Korea.

出版信息

Clin Linguist Phon. 2024 Aug 20:1-14. doi: 10.1080/02699206.2024.2387609.

DOI:10.1080/02699206.2024.2387609
PMID:39162064
Abstract

This study presents a model of automatic speech recognition (ASR) that is designed to diagnose pronunciation issues in children with speech sound disorders (SSDs) to replace manual transcriptions in clinical procedures. Because ASR models trained for general purposes mainly predict input speech into standard spelling words, well-known high-performance ASR models are not suitable for evaluating pronunciation in children with SSDs. We fine-tuned the wav2vec2.0 XLS-R model to recognise words as they are pronounced by children, rather than converting the speech into their standard spelling words. The model was fine-tuned with a speech dataset of 137 children with SSDs pronouncing 73 Korean words that are selected for actual clinical diagnosis. The model's Phoneme Error Rate (PER) was only 10% when its predictions of children's pronunciations were compared to human annotations of pronunciations as heard. In contrast, despite its robust performance on general tasks, the state-of-the-art ASR model Whisper showed limitations in recognising the speech of children with SSDs, with a PER of approximately 50%. While the model still requires improvement in terms of the recognition of unclear pronunciation, this study demonstrates that ASR models can streamline complex pronunciation error diagnostic procedures in clinical fields.

摘要

本研究提出了一种自动语音识别(ASR)模型,该模型旨在诊断语音障碍(SSD)儿童的发音问题,以取代临床程序中的人工转录。由于通用训练的ASR模型主要将输入语音预测为标准拼写单词,因此著名的高性能ASR模型不适用于评估SSD儿童的发音。我们对wav2vec2.0 XLS-R模型进行了微调,以识别儿童的发音单词,而不是将语音转换为标准拼写单词。该模型使用137名SSD儿童的语音数据集进行微调,这些儿童发音了73个用于实际临床诊断的韩语单词。当将该模型对儿童发音的预测与听到的发音的人工标注进行比较时,其音素错误率(PER)仅为10%。相比之下,尽管最先进的ASR模型Whisper在一般任务上表现强劲,但在识别SSD儿童的语音方面存在局限性,PER约为50%。虽然该模型在识别不清晰发音方面仍需改进,但本研究表明,ASR模型可以简化临床领域复杂的发音错误诊断程序。

相似文献

1
Automatic speech recognition (ASR) for the diagnosis of pronunciation of speech sound disorders in Korean children.用于诊断韩国儿童语音障碍发音的自动语音识别(ASR)
Clin Linguist Phon. 2024 Aug 20:1-14. doi: 10.1080/02699206.2024.2387609.
2
Automatic Analysis of Pronunciations for Children with Speech Sound Disorders.语音障碍儿童发音的自动分析
Comput Speech Lang. 2018 Jul;50:62-84. doi: 10.1016/j.csl.2017.12.006. Epub 2017 Dec 27.
3
Development and benchmarking of a Korean audio speech recognition model for Clinician-Patient conversations in radiation oncology clinics.开发和基准测试韩国语音识别模型,用于放射肿瘤学临床中的医患对话。
Int J Med Inform. 2023 Aug;176:105112. doi: 10.1016/j.ijmedinf.2023.105112. Epub 2023 Jun 1.
4
Automatic Speech Recognition in Primary Progressive Apraxia of Speech.原发性进行性运动性构音障碍的自动语音识别。
J Speech Lang Hear Res. 2024 Sep 12;67(9):2964-2976. doi: 10.1044/2024_JSLHR-24-00049. Epub 2024 Aug 6.
5
Combining automatic speech recognition with semantic natural language processing in schizophrenia.将自动语音识别与语义自然语言处理相结合在精神分裂症中的应用。
Psychiatry Res. 2023 Jul;325:115252. doi: 10.1016/j.psychres.2023.115252. Epub 2023 May 16.
6
Heterophonic speech recognition using composite phones.使用复合音素的异音语音识别。
Springerplus. 2016 Nov 24;5(1):2008. doi: 10.1186/s40064-016-3332-9. eCollection 2016.
7
The impact of automatic speech recognition technology on second language pronunciation and speaking skills of EFL learners: a mixed methods investigation.自动语音识别技术对英语外语学习者第二语言发音和口语技能的影响:一项混合方法研究。
Front Psychol. 2023 Aug 16;14:1210187. doi: 10.3389/fpsyg.2023.1210187. eCollection 2023.
8
Phonological feature-based speech recognition system for pronunciation training in non-native language learning.用于非母语语言学习中发音训练的基于语音特征的语音识别系统。
J Acoust Soc Am. 2018 Jan;143(1):98. doi: 10.1121/1.5017834.
9
Advances in Completely Automated Vowel Analysis for Sociophonetics: Using End-to-End Speech Recognition Systems With DARLA.社会语音学中全自动化元音分析的进展:使用带有DARLA的端到端语音识别系统
Front Artif Intell. 2021 Sep 24;4:662097. doi: 10.3389/frai.2021.662097. eCollection 2021.
10
End-to-End Automatic Pronunciation Error Detection Based on Improved Hybrid CTC/Attention Architecture.基于改进的混合 CTC/注意力架构的端到端自动发音错误检测。
Sensors (Basel). 2020 Mar 25;20(7):1809. doi: 10.3390/s20071809.

引用本文的文献

1
Usefulness of Automatic Speech Recognition Assessment of Children With Speech Sound Disorders: Validation Study.语音障碍儿童自动语音识别评估的效用:验证研究
J Med Internet Res. 2025 Jan 14;27:e60520. doi: 10.2196/60520.