• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

在二元互动中利用语言语境来改进儿童自动语音识别

Leveraging Linguistic Context in Dyadic Interactions to Improve Automatic Speech Recognition for Children.

作者信息

Kumar Manoj, Kim So Hyun, Lord Catherine, Lyon Thomas D, Narayanan Shrikanth

机构信息

Signal Analysis and Interpretation Lab, University of Southern California.

Center for Autism and the Developing Brain, Weill Cornell Medicine.

出版信息

Comput Speech Lang. 2020 Sep;63. doi: 10.1016/j.csl.2020.101101. Epub 2020 Apr 16.

DOI:10.1016/j.csl.2020.101101
PMID:32431473
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7236760/
Abstract

Automatic speech recognition for child speech has been long considered a more challenging problem than for adult speech. Various contributing factors have been identified such as larger acoustic speech variability including mispronunciations due to continuing biological changes in growth, developing vocabulary and linguistic skills, and scarcity of training corpora. A further challenge arises when dealing with spontaneous speech of children involved in a conversational interaction, and especially when the child may have limited or impaired communication ability. This includes health applications, one of the motivating domains of this paper, that involve goal-oriented dyadic interactions between a child and clinician/adult social partner as a part of behavioral assessment. In this work, we use linguistic context information from the interaction to adapt speech recognition models for children speech. Specifically, spoken language from the interacting adult speech provides the context for the child's speech. We propose two methods to exploit this context: lexical repetitions and semantic response generation. For the latter, we make use of sequence-to-sequence models that learn to predict the target child utterance given context adult utterances. Long-term context is incorporated in the model by propagating the cell-state across the duration of conversation. We use interpolation techniques to adapt language models at the utterance level, and analyze the effect of length and direction of context (forward and backward). Two different domains are used in our experiments to demonstrate the generalized nature of our methods - interactions between a child with ASD and an adult social partner in a play-based, naturalistic setting, and in forensic interviews between a child and a trained interviewer. In both cases, context-adapted models yield significant improvement (upto 10.71% in absolute word error rate) over the baseline and perform consistently across context windows and directions. Using statistical analysis, we investigate the effect of source-based (adult) and target-based (child) factors on adaptation methods. Our results demonstrate the applicability of our modeling approach in improving child speech recognition by employing information transfer from the adult interlocutor.

摘要

长期以来,儿童语音的自动语音识别一直被认为是一个比成人语音更具挑战性的问题。已经确定了各种促成因素,例如更大的声学语音变异性,包括由于生长过程中持续的生理变化、词汇和语言技能发展导致的发音错误,以及训练语料库的稀缺性。在处理参与对话互动的儿童的自发语音时,尤其是当儿童的沟通能力可能有限或受损时,会出现进一步的挑战。这包括健康应用,这是本文的一个激励领域,涉及儿童与临床医生/成人社交伙伴之间作为行为评估一部分的目标导向二元互动。在这项工作中,我们利用互动中的语言上下文信息来调整儿童语音的语音识别模型。具体来说,来自互动成人语音的口语为儿童语音提供了上下文。我们提出了两种利用这种上下文的方法:词汇重复和语义响应生成。对于后者,我们使用序列到序列模型,该模型学习根据上下文成人话语预测目标儿童话语。通过在对话持续时间内传播单元状态,将长期上下文纳入模型。我们使用插值技术在话语级别调整语言模型,并分析上下文长度和方向(向前和向后)的影响。我们的实验使用了两个不同的领域来证明我们方法的通用性——患有自闭症谱系障碍(ASD)的儿童与成人社交伙伴在基于游戏的自然环境中的互动,以及儿童与训练有素的采访者之间的法医访谈。在这两种情况下,上下文适应模型相对于基线都有显著改进(绝对单词错误率高达10.71%),并且在上下文窗口和方向上表现一致。通过统计分析,我们研究了基于源(成人)和基于目标(儿童)的因素对适应方法的影响。我们的结果证明了我们的建模方法通过采用来自成人对话者的信息传递来改进儿童语音识别的适用性。

相似文献

1
Leveraging Linguistic Context in Dyadic Interactions to Improve Automatic Speech Recognition for Children.在二元互动中利用语言语境来改进儿童自动语音识别
Comput Speech Lang. 2020 Sep;63. doi: 10.1016/j.csl.2020.101101. Epub 2020 Apr 16.
2
The psychologist as an interlocutor in autism spectrum disorder assessment: insights from a study of spontaneous prosody.心理学家在自闭症谱系障碍评估中作为对话者的角色:来自一项关于自发韵律研究的见解
J Speech Lang Hear Res. 2014 Aug;57(4):1162-77. doi: 10.1044/2014_JSLHR-S-13-0062.
3
Adding to the Conversation: Language Delays and Parent-Child Interactions in the Younger Siblings of Children With Autism.参与讨论:自闭症儿童的弟弟妹妹的语言发育迟缓与亲子互动
J Autism Dev Disord. 2025 May;55(5):1565-1579. doi: 10.1007/s10803-024-06331-4. Epub 2024 Apr 2.
4
Prosodic and lexical aspects of maternal linguistic input to late-talking toddlers.母亲对学话较晚幼儿的语言输入中的韵律和词汇方面。
Int J Lang Commun Disord. 2006 May-Jun;41(3):293-311. doi: 10.1080/13682820500342976.
5
Conversation electrified: ERP correlates of speech act recognition in underspecified utterances.充满活力的对话:未明确表述话语中言语行为识别的事件相关电位关联
PLoS One. 2015 Mar 20;10(3):e0120068. doi: 10.1371/journal.pone.0120068. eCollection 2015.
6
Learning language in autism: maternal linguistic input contributes to later vocabulary.自闭症患者的语言学习:母亲的语言输入对后期词汇量有影响。
Autism Res. 2015 Apr;8(2):214-23. doi: 10.1002/aur.1440. Epub 2015 Mar 2.
7
Language skills of children with early cochlear implantation.早期接受人工耳蜗植入的儿童的语言技能。
Ear Hear. 2003 Feb;24(1 Suppl):46S-58S. doi: 10.1097/01.AUD.0000051689.57380.1B.
8
Some Neurocognitive Correlates of Noise-Vocoded Speech Perception in Children With Normal Hearing: A Replication and Extension of ).听力正常儿童噪声-声码语音感知的一些神经认知关联:一项(研究的)复制与扩展 。 (注:原文括号部分不完整,翻译时保留原样)
Ear Hear. 2017 May/Jun;38(3):344-356. doi: 10.1097/AUD.0000000000000393.
9
Lexical Alignment is Pervasive Across Contexts in Non-WEIRD Adult-Child Interactions.词汇对齐在非 WEIRD 成人-儿童互动的各个情境中普遍存在。
Cogn Sci. 2024 Mar;48(3):e13417. doi: 10.1111/cogs.13417.
10
Detection without further processing or processing without automatic detection? Differential ERP responses to lexical-semantic processing in toddlers at high clinical risk for autism and language disorder.是否需要进一步处理或无需自动检测?高临床自闭症和语言障碍风险的幼儿在词汇语义处理中的差异 ERP 反应。
Cortex. 2021 Aug;141:465-481. doi: 10.1016/j.cortex.2021.04.020. Epub 2021 May 27.

引用本文的文献

1
Online coding of the Brief Observation of Social Communication Change (BOSCC) to capture treatment response in minimally verbal children with autism spectrum disorder.对社会沟通变化简短观察量表(BOSCC)进行在线编码,以捕捉患有自闭症谱系障碍的极少言语儿童的治疗反应。
Digit Health. 2025 Jun 17;11:20552076251347105. doi: 10.1177/20552076251347105. eCollection 2025 Jan-Dec.
2
Reliably quantifying the severity of social symptoms in children with autism using ASDSpeech.使用ASD语音可靠地量化自闭症儿童社交症状的严重程度。
Transl Psychiatry. 2025 Jan 18;15(1):14. doi: 10.1038/s41398-025-03233-6.

本文引用的文献

1
Transfer Learning from Adult to Children for Speech Recognition: Evaluation, Analysis and Recommendations.从成人到儿童的语音识别迁移学习:评估、分析与建议
Comput Speech Lang. 2020 Sep;63. doi: 10.1016/j.csl.2020.101077. Epub 2020 Feb 18.
2
The Prevalence of Parent-Reported Autism Spectrum Disorder Among US Children.美国儿童家长报告的自闭症谱系障碍患病率。
Pediatrics. 2018 Dec;142(6). doi: 10.1542/peds.2017-4161.
3
Children's Disclosure of Sexual Abuse: A Systematic Review of Qualitative Research Exploring Barriers and Facilitators.儿童对性虐待的披露:探索障碍与促进因素的定性研究系统综述
J Child Sex Abus. 2018 Feb-Mar;27(2):176-194. doi: 10.1080/10538712.2018.1425943. Epub 2018 Feb 28.
4
Measuring Changes in Social Communication Behaviors: Preliminary Development of the Brief Observation of Social Communication Change (BOSCC).测量社会交往行为的变化:社会交往变化简短观察法(BOSCC)的初步开发
J Autism Dev Disord. 2016 Jul;46(7):2464-79. doi: 10.1007/s10803-016-2782-9.
5
Developmental acoustic study of American English diphthongs.美式英语双元音的发展声学研究。
J Acoust Soc Am. 2014 Oct;136(4):1880-94. doi: 10.1121/1.4894799.
6
Behavioral Signal Processing: Deriving Human Behavioral Informatics From Speech and Language: Computational techniques are presented to analyze and model expressed and perceived human behavior-variedly characterized as typical, atypical, distressed, and disordered-from speech and language cues and their applications in health, commerce, education, and beyond.行为信号处理:从语音和语言中提取人类行为信息学:本文介绍了计算技术,用于从语音和语言线索中分析和建模所表达和感知到的人类行为——这些行为具有典型、非典型、苦恼和紊乱等不同特征——及其在健康、商业、教育等领域的应用。
Proc IEEE Inst Electr Electron Eng. 2013 Feb 7;101(5):1203-1233. doi: 10.1109/JPROC.2012.2236291.
7
Lexical and affective prosody in children with high-functioning autism.高功能自闭症儿童的词汇和情感韵律。
J Speech Lang Hear Res. 2010 Jun;53(3):778-93. doi: 10.1044/1092-4388(2009/08-0127).
8
Learning long-term dependencies with gradient descent is difficult.使用梯度下降法学习长期依赖关系是困难的。
IEEE Trans Neural Netw. 1994;5(2):157-66. doi: 10.1109/72.279181.
9
A structured forensic interview protocol improves the quality and informativeness of investigative interviews with children: a review of research using the NICHD Investigative Interview Protocol.结构化的法医访谈协议可提高对儿童进行调查性访谈的质量和信息含量:对使用美国国立儿童健康与人类发展研究所(NICHD)调查性访谈协议的研究综述。
Child Abuse Negl. 2007 Nov-Dec;31(11-12):1201-31. doi: 10.1016/j.chiabu.2007.03.021. Epub 2007 Nov 19.
10
Improving credibility assessment in child sexual abuse allegations: the role of the NICHD investigative interview protocol.提高儿童性虐待指控中的可信度评估:国家儿童健康与人类发展研究所调查性访谈协议的作用。
Child Abuse Negl. 2007 Feb;31(2):99-110. doi: 10.1016/j.chiabu.2006.09.005. Epub 2007 Feb 20.