McCoy Thomas H, Perlis Roy H
Center for Quantitative Health and Department of Psychiatry, Massachusetts General Hospital, Boston, MA, USA.
Department of Psychiatry, Harvard Medical School, Boston, MA, USA.
J Mood Anxiety Disord. 2025 Jan 31;10:100109. doi: 10.1016/j.xjmad.2025.100109. eCollection 2025 Jun.
We investigated whether large language models can stratify risk for suicide following hospital discharge. We drew on a very large cohort of 458,053 adults discharged from two academic medical centers between January 4, 2005 and January 2, 2014, linked to administrative vital status data. From this sample, each of the 1995 individuals who died by suicide or accident was matched with 5 control individuals on the basis of age, sex, race and ethnicity, admitting hospital, insurance, comorbidity index, and discharge year. We applied a HIPAA-compliant large language model (gpt-4-1106-preview) to estimate risk for suicide based on narrative discharge summaries. In the resulting cohort (n = 11,970), median age was 57 (IQR 44 -76); 4536 (38 %) were women; 348 (3 %) had a primary psychiatric admission diagnosis. For the model-predicted risk, time to 90 % survival was 1588 days (IQR 1374-1905) in the lowest-risk quartile, 1432 (IQR 1157-1651) in the 2nd quartile, 661 (IQR 538-820) in the 3rd quartile, and 302 (IQR 260-362) in the top quartile (p < .001). In Fine and Gray competing risk regression, predicted hazard was significantly associated with observed risk (unadjusted HR 7.66 [95 % CI 6.40-9.27]; adjusted for sociodemographic features and utilization, HR 8.86 (7.00-11.2)). Estimated risks were significantly greater scores among individuals who were Black or Hispanic (p < .005 for each, versus white individuals). Overall, a large language model (LLM) was able to stratify risk for suicide and accidental death among individuals discharged from academic medical centers beyond that afforded by simple sociodemographic and clinical features medical centers.
我们调查了大语言模型能否对出院后自杀风险进行分层。我们利用了一个非常大的队列,该队列包含2005年1月4日至2014年1月2日期间从两个学术医疗中心出院的458,053名成年人,并与行政生命状态数据相关联。从这个样本中,1995名自杀或意外死亡的个体中的每一个都根据年龄、性别、种族和民族、收治医院、保险、合并症指数和出院年份与5名对照个体进行匹配。我们应用了一个符合《健康保险流通与责任法案》(HIPAA)的大语言模型(gpt-4-1106-preview),根据出院小结的叙述来估计自杀风险。在最终队列(n = 11,970)中,中位年龄为57岁(四分位间距44 - 76);4536名(38%)为女性;348名(3%)有主要精神科入院诊断。对于模型预测的风险,在最低风险四分位数中,90%生存率的时间为1588天(四分位间距1374 - 1905),在第二四分位数中为1432天(四分位间距1157 - 1651),在第三四分位数中为661天(四分位间距538 - 820),在最高四分位数中为302天(四分位间距260 - 362)(p <.001)。在费恩和格雷竞争风险回归中,预测风险与观察到的风险显著相关(未调整的风险比7.66 [95%置信区间6.40 - 9.27];经社会人口学特征和利用情况调整后,风险比8.86 [7.00 - 11.2])。在黑人或西班牙裔个体中,估计风险得分显著更高(与白人个体相比,每种情况p <.005)。总体而言,一个大语言模型(LLM)能够对学术医疗中心出院个体的自杀和意外死亡风险进行分层,其分层能力超过了简单的社会人口学和临床特征所提供的分层能力。