Gupta Rahul, Bone Daniel, Lee Sungbok, Narayanan Shrikanth
Signal Analysis and Interpretation Laboratory (SAIL), University of Southern California, 3710 McClintock Avenue, Los Angeles, CA 90089, USA.
Comput Speech Lang. 2016 May;37:47-66. doi: 10.1016/j.csl.2015.09.003. Epub 2015 Oct 23.
Child engagement is defined as the interaction of a child with his/her environment in a contextually appropriate manner. Engagement behavior in children is linked to socio-emotional and cognitive state assessment with enhanced engagement identified with improved skills. A vast majority of studies however rely solely, and often implicitly, on subjective perceptual measures of engagement. Access to automatic quantification could assist researchers/clinicians to objectively interpret engagement with respect to a target behavior or condition, and furthermore inform mechanisms for improving engagement in various settings. In this paper, we present an engagement prediction system based exclusively on vocal cues observed during structured interaction between a child and a psychologist involving several tasks. Specifically, we derive prosodic cues that capture engagement levels across the various tasks. Our experiments suggest that a child's engagement is reflected not only in the vocalizations, but also in the speech of the interacting psychologist. Moreover, we show that prosodic cues are informative of the engagement phenomena not only as characterized over the entire task (i.e., global cues), but also in short term patterns (i.e., local cues). We perform a classification experiment assigning the engagement of a child into three discrete levels achieving an unweighted average recall of 55.8% (chance is 33.3%). While the systems using global cues and local level cues are each statistically significant in predicting engagement, we obtain the best results after fusing these two components. We perform further analysis of the cues at local and global levels to achieve insights linking specific prosodic patterns to the engagement phenomenon. We observe that while the performance of our model varies with task setting and interacting psychologist, there exist universal prosodic patterns reflective of engagement.
儿童参与度被定义为儿童以符合情境的方式与他/她的环境进行互动。儿童的参与行为与社会情感和认知状态评估相关联,参与度提高表明技能有所提升。然而,绝大多数研究仅仅(且常常是隐含地)依赖于对参与度的主观感知测量。能够进行自动量化有助于研究人员/临床医生客观地解读与目标行为或状况相关的参与度,进而为改善不同环境下的参与度提供机制依据。在本文中,我们提出了一个仅基于儿童与心理学家在涉及多项任务的结构化互动过程中观察到的语音线索的参与度预测系统。具体而言,我们提取出能够捕捉各项任务中参与度水平的韵律线索。我们的实验表明,儿童的参与度不仅体现在发声中,还体现在与之互动的心理学家的言语中。此外,我们表明韵律线索不仅在整个任务中(即全局线索)能够体现参与度现象,在短期模式中(即局部线索)也能体现。我们进行了一项分类实验,将儿童的参与度分为三个离散级别,未加权平均召回率达到55.8%(随机概率为33.3%)。虽然使用全局线索和局部线索的系统在预测参与度方面各自都具有统计学意义,但在融合这两个组件后我们获得了最佳结果。我们对局部和全局层面的线索进行了进一步分析,以深入了解将特定韵律模式与参与度现象联系起来的情况。我们观察到,虽然我们模型的性能因任务设置和与之互动的心理学家而异,但存在反映参与度的通用韵律模式。