Department of Psychiatry and Behavioral Sciences, University of Washington, 1100 NE 45th St,, Ste, 300, Seattle, WA USA.
Implement Sci. 2014 Apr 24;9:49. doi: 10.1186/1748-5908-9-49.
Behavioral interventions such as psychotherapy are leading, evidence-based practices for a variety of problems (e.g., substance abuse), but the evaluation of provider fidelity to behavioral interventions is limited by the need for human judgment. The current study evaluated the accuracy of statistical text classification in replicating human-based judgments of provider fidelity in one specific psychotherapy--motivational interviewing (MI).
Participants (n = 148) came from five previously conducted randomized trials and were either primary care patients at a safety-net hospital or university students. To be eligible for the original studies, participants met criteria for either problematic drug or alcohol use. All participants received a type of brief motivational interview, an evidence-based intervention for alcohol and substance use disorders. The Motivational Interviewing Skills Code is a standard measure of MI provider fidelity based on human ratings that was used to evaluate all therapy sessions. A text classification approach called a labeled topic model was used to learn associations between human-based fidelity ratings and MI session transcripts. It was then used to generate codes for new sessions. The primary comparison was the accuracy of model-based codes with human-based codes.
Receiver operating characteristic (ROC) analyses of model-based codes showed reasonably strong sensitivity and specificity with those from human raters (range of area under ROC curve (AUC) scores: 0.62 - 0.81; average AUC: 0.72). Agreement with human raters was evaluated based on talk turns as well as code tallies for an entire session. Generated codes had higher reliability with human codes for session tallies and also varied strongly by individual code.
To scale up the evaluation of behavioral interventions, technological solutions will be required. The current study demonstrated preliminary, encouraging findings regarding the utility of statistical text classification in bridging this methodological gap.
心理疗法等行为干预措施是解决各种问题(如药物滥用)的主要循证实践方法,但由于需要进行人工判断,因此对行为干预措施的提供者保真度的评估受到限制。本研究评估了统计文本分类在复制特定心理疗法(即动机访谈(MI))中提供者保真度的人工判断方面的准确性。
参与者(n=148)来自之前进行的五项随机试验,要么是一家安全网医院的初级保健患者,要么是大学生。为了符合原始研究的条件,参与者符合药物或酒精使用问题的标准。所有参与者都接受了一种简短的动机访谈,这是一种针对酒精和药物使用障碍的循证干预措施。动机访谈技能编码是一种基于人工评分的 MI 提供者保真度的标准衡量标准,用于评估所有治疗课程。使用一种称为标记主题模型的文本分类方法来学习基于人工的保真度评分与 MI 会话记录之间的关联。然后,它用于为新的会话生成代码。主要比较是基于模型的代码与基于人工的代码的准确性。
基于模型的代码的接收器操作特征(ROC)分析显示出与人工评分相当强的敏感性和特异性(ROC 曲线下面积(AUC)评分范围:0.62-0.81;平均 AUC:0.72)。根据谈话轮次以及整个会话的代码计数来评估与人工评分者的一致性。生成的代码与会话计数的人工代码具有更高的可靠性,并且单个代码之间的差异也很大。
为了扩大行为干预措施的评估范围,将需要技术解决方案。本研究初步证明了统计文本分类在弥合这一方法学差距方面的实用性,这是令人鼓舞的发现。