Annu Int Conf IEEE Eng Med Biol Soc. 2021 Nov;2021:1836-1839. doi: 10.1109/EMBC46164.2021.9629694.
Cognitive Behavioral Therapy (CBT) is a goal-oriented psychotherapy for mental health concerns implemented in a conversational setting. The quality of a CBT session is typically assessed by trained human raters who manually assign pre-defined session-level behavioral codes. In this paper, we develop an end-to-end pipeline that converts speech audio to diarized and transcribed text and extracts linguistic features to code the CBT sessions automatically. We investigate both word-level and utterance-level features and propose feature fusion strategies to combine them. The utterance level features include dialog act tags as well as behavioral codes drawn from another well-known talk psychotherapy called Motivational Interviewing (MI). We propose a novel method to augment the word-based features with the utterance level tags for subsequent CBT code estimation. Experiments show that our new fusion strategy outperforms all the studied features, both when used individually and when fused by direct concatenation. We also find that incorporating a sentence segmentation module can further improve the overall system given the preponderance of multi-utterance conversational turns in CBT sessions.
认知行为疗法(CBT)是一种针对心理健康问题的以目标为导向的心理治疗方法,在会话环境中实施。CBT 会话的质量通常由经过培训的人类评估员进行评估,他们手动分配预定义的会话级行为代码。在本文中,我们开发了一个端到端的管道,将语音音频转换为已标注和转录的文本,并提取语言特征自动对 CBT 会话进行编码。我们研究了单词级和语句级特征,并提出了特征融合策略来组合它们。语句级特征包括对话行为标签以及另一种著名的谈话心理治疗方法称为动机性访谈(MI)中的行为代码。我们提出了一种新的方法,即用语句级标签来扩充基于单词的特征,以便后续进行 CBT 代码估计。实验表明,我们的新融合策略在单独使用和直接串联融合时都优于所有研究的特征。我们还发现,在 CBT 会话中存在大量多语句对话轮次的情况下,引入句子分割模块可以进一步提高整体系统性能。