White River Junction VA Medical Center, White River Junction, VT, USA.
Geisel School of Medicine at Dartmouth, Hanover, NH, USA.
Psychol Med. 2021 Jun;51(8):1382-1391. doi: 10.1017/S0033291720000173. Epub 2020 Feb 17.
This study evaluated whether natural language processing (NLP) of psychotherapy note text provides additional accuracy over and above currently used suicide prediction models.
We used a cohort of Veterans Health Administration (VHA) users diagnosed with post-traumatic stress disorder (PTSD) between 2004-2013. Using a case-control design, cases (those that died by suicide during the year following diagnosis) were matched to controls (those that remained alive). After selecting conditional matches based on having shared mental health providers, we chose controls using a 5:1 nearest-neighbor propensity match based on the VHA's structured Electronic Medical Records (EMR)-based suicide prediction model. For cases, psychotherapist notes were collected from diagnosis until death. For controls, psychotherapist notes were collected from diagnosis until matched case's date of death. After ensuring similar numbers of notes, the final sample included 246 cases and 986 controls. Notes were analyzed using Sentiment Analysis and Cognition Engine, a Python-based NLP package. The output was evaluated using machine-learning algorithms. The area under the curve (AUC) was calculated to determine models' predictive accuracy.
NLP derived variables offered small but significant predictive improvement (AUC = 0.58) for patients that had longer treatment duration. A small sample size limited predictive accuracy.
Study identifies a novel method for measuring suicide risk over time and potentially categorizing patient subgroups with distinct risk sensitivities. Findings suggest leveraging NLP derived variables from psychotherapy notes offers an additional predictive value over and above the VHA's state-of-the-art structured EMR-based suicide prediction model. Replication with a larger non-PTSD specific sample is required.
本研究评估了自然语言处理(NLP)对心理治疗记录文本的分析是否能提高目前使用的自杀预测模型的准确性。
我们使用了一个 2004 年至 2013 年间被诊断患有创伤后应激障碍(PTSD)的退伍军人健康管理局(VHA)用户队列。采用病例对照设计,将在诊断后一年内自杀的病例(即死亡病例)与对照组(即存活病例)进行匹配。在根据共享心理健康提供者选择条件匹配后,我们根据 VHA 的结构化电子病历(EMR)自杀预测模型,以 5:1 的最近邻倾向匹配选择对照组。对于病例,从诊断到死亡收集心理治疗师的记录。对于对照组,从诊断到匹配病例的死亡日期收集心理治疗师的记录。在确保记录数量相似后,最终样本包括 246 例病例和 986 例对照组。使用基于 Python 的 NLP 包 Sentiment Analysis and Cognition Engine 对记录进行分析。使用机器学习算法评估输出结果。计算曲线下面积(AUC)以确定模型的预测准确性。
NLP 衍生变量为治疗时间较长的患者提供了较小但有意义的预测改善(AUC = 0.58)。样本量小限制了预测准确性。
该研究确定了一种随着时间推移测量自杀风险的新方法,并可能对具有不同风险敏感性的患者亚组进行分类。研究结果表明,利用心理治疗记录中的 NLP 衍生变量提供了比 VHA 基于先进的结构化 EMR 的自杀预测模型更高的附加预测价值。需要使用更大的非 PTSD 特定样本进行复制。