Swaminathan Akshay, López Iván, Mar Rafael Antonio Garcia, Heist Tyler, McClintock Tom, Caoili Kaitlin, Grace Madeline, Rubashkin Matthew, Boggs Michael N, Chen Jonathan H, Gevaert Olivier, Mou David, Nock Matthew K
Cerebral Inc, Claymont, DE, USA.
Stanford University School of Medicine, Stanford, CA, USA.
NPJ Digit Med. 2023 Nov 21;6(1):213. doi: 10.1038/s41746-023-00951-3.
Patients experiencing mental health crises often seek help through messaging-based platforms, but may face long wait times due to limited message triage capacity. Here we build and deploy a machine-learning-enabled system to improve response times to crisis messages in a large, national telehealth provider network. We train a two-stage natural language processing (NLP) system with key word filtering followed by logistic regression on 721 electronic medical record chat messages, of which 32% are potential crises (suicidal/homicidal ideation, domestic violence, or non-suicidal self-injury). Model performance is evaluated on a retrospective test set (4/1/21-4/1/22, N = 481) and a prospective test set (10/1/22-10/31/22, N = 102,471). In the retrospective test set, the model has an AUC of 0.82 (95% CI: 0.78-0.86), sensitivity of 0.99 (95% CI: 0.96-1.00), and PPV of 0.35 (95% CI: 0.309-0.4). In the prospective test set, the model has an AUC of 0.98 (95% CI: 0.966-0.984), sensitivity of 0.98 (95% CI: 0.96-0.99), and PPV of 0.66 (95% CI: 0.626-0.692). The daily median time from message receipt to crisis specialist triage ranges from 8 to 13 min, compared to 9 h before the deployment of the system. We demonstrate that a NLP-based machine learning model can reliably identify potential crisis chat messages in a telehealth setting. Our system integrates into existing clinical workflows, suggesting that with appropriate training, humans can successfully leverage ML systems to facilitate triage of crisis messages.
经历心理健康危机的患者通常会通过基于消息的平台寻求帮助,但由于消息分诊能力有限,他们可能会面临较长的等待时间。在此,我们构建并部署了一个基于机器学习的系统,以缩短在一个大型全国性远程医疗服务提供商网络中对危机消息的响应时间。我们使用关键词过滤训练了一个两阶段自然语言处理(NLP)系统,随后在721条电子病历聊天消息上进行逻辑回归,其中32%是潜在危机(自杀/杀人意念、家庭暴力或非自杀性自伤)。在回顾性测试集(2021年4月1日至2022年4月1日,N = 481)和前瞻性测试集(2022年10月1日至2022年10月31日,N = 102,471)上评估模型性能。在回顾性测试集中,该模型的曲线下面积(AUC)为0.82(95%置信区间:0.78 - 0.86),敏感性为0.99(95%置信区间:0.96 - 1.00),阳性预测值(PPV)为0.35(95%置信区间:0.309 - 0.4)。在前瞻性测试集中,该模型的AUC为0.98(95%置信区间:0.966 - 0.984),敏感性为0.98(95%置信区间:0.96 - 0.99),PPV为0.66(95%置信区间:0.626 - 0.692)。从消息接收到危机专家分诊的每日中位数时间从8分钟到1