Chen Zhuohao, Flemotomos Nikolaos, Singla Karan, Creed Torrey A, Atkins David C, Narayanan Shrikanth
Signal Analysis and Interpretation Lab, University of Southern California, Los Angeles, CA, USA.
Interactions LLC, Los Angeles, CA, USA.
Comput Speech Lang. 2022 Sep;75. doi: 10.1016/j.csl.2022.101380. Epub 2022 Mar 28.
Text-based computational approaches for assessing the quality of psychotherapy are being developed to support quality assurance and clinical training. However, due to the long durations of typical conversation based therapy sessions, and due to limited annotated modeling resources, computational methods largely rely on frequency-based lexical features or dialogue acts to assess the overall session level characteristics. In this work, we propose a hierarchical framework to automatically evaluate the quality of transcribed Cognitive Behavioral Therapy (CBT) interactions. Given the richly dynamic nature of the spoken dialog within a talk therapy session, to evaluate the overall session level quality, we propose to consider modeling it as a function of local variations across the interaction. To implement that empirically, we divide each psychotherapy session into conversation segments and initialize the segment-level qualities with the session-level scores. First, we produce segment embeddings by fine-tuning a BERT-based model, and predict segment-level (local) quality scores. These embeddings are used as the lower-level input to a Bidirectional LSTM-based neural network to predict the session-level (global) quality estimates. In particular, we model the global quality as a linear function of the local quality scores, which allows us to update the segment-level quality estimates based on the session-level quality prediction. These newly estimated segment-level scores benefit the BERT fine-tuning process, which in turn results in better segment embeddings. We evaluate the proposed framework on automatically derived transcriptions from real-world CBT clinical recordings to predict session-level behavior codes. The results indicate that our approach leads to improved evaluation accuracy for most codes when used for both regression and classification tasks.
用于评估心理治疗质量的基于文本的计算方法正在被开发,以支持质量保证和临床培训。然而,由于典型的基于对话的治疗疗程持续时间长,且注释建模资源有限,计算方法在很大程度上依赖基于频率的词汇特征或对话行为来评估整个疗程的特征。在这项工作中,我们提出了一个分层框架来自动评估转录的认知行为疗法(CBT)互动的质量。鉴于谈话治疗疗程中口语对话具有丰富的动态性质,为了评估整个疗程的质量,我们建议将其建模为互动中局部变化的函数。为了从经验上实现这一点,我们将每个心理治疗疗程划分为对话片段,并用疗程级分数初始化片段级质量。首先,我们通过微调基于BERT的模型来生成片段嵌入,并预测片段级(局部)质量分数。这些嵌入被用作基于双向长短期记忆网络的神经网络的低级输入,以预测疗程级(全局)质量估计。特别是,我们将全局质量建模为局部质量分数的线性函数,这使我们能够根据疗程级质量预测更新片段级质量估计。这些新估计的片段级分数有利于BERT微调过程,这反过来又会产生更好的片段嵌入。我们在从真实世界CBT临床记录自动导出的转录本上评估所提出的框架,以预测疗程级行为代码。结果表明,当用于回归和分类任务时,我们的方法对大多数代码的评估准确率有所提高。