Singla Karan, Chen Zhuohao, Flemotomos Nikolaos, Gibson James, Can Dogan, Atkins David C, Narayanan Shrikanth
Signal Analysis and Interpretation Lab, University of Southern California, Los Angeles, CA, USA.
Department of Psychiatry and Behavioral Sciences, University of Washington, Seattle, WA, USA.
Interspeech. 2018 Sep;2018:3413-3417. doi: 10.21437/interspeech.2018-2551.
In this paper, we present an approach for predicting utterance level behaviors in psychotherapy sessions using both speech and lexical features. We train long short term memory (LSTM) networks with an attention mechanism using words, both manually and automatically transcribed, and prosodic features, at the word level, to predict the annotated behaviors. We demonstrate that prosodic features provide discriminative information relevant to the behavior task and show that they improve prediction when fused with automatically derived lexical features. Additionally, we investigate the weights of the attention mechanism to determine words and prosodic patterns which are of importance to the behavior prediction task.
在本文中,我们提出了一种利用语音和词汇特征预测心理治疗会话中话语级行为的方法。我们使用带有注意力机制的长短期记忆(LSTM)网络,通过人工转录和自动转录的单词以及单词级的韵律特征来训练,以预测带注释的行为。我们证明韵律特征提供了与行为任务相关的判别信息,并表明当与自动提取的词汇特征融合时,它们能提高预测效果。此外,我们研究了注意力机制的权重,以确定对行为预测任务重要的单词和韵律模式。