用于更透明地预测自杀风险的推理语言模型。

Reasoning language models for more transparent prediction of suicide risk.

作者信息

McCoy Thomas H, Perlis Roy H

机构信息

Center for Quantitative Health, Massachusetts General Hospital, Boston, Massachusetts, USA.

Department of Psychiatry, Harvard Medical School, Boston, Massachusetts, USA.

出版信息

BMJ Ment Health. 2025 May 11;28(1):e301654. doi: 10.1136/bmjment-2025-301654.

DOI:10.1136/bmjment-2025-301654

PMID:40350181

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12067846/

Abstract

BACKGROUND

We previously demonstrated that a large language model could estimate suicide risk using hospital discharge notes.

OBJECTIVE

With the emergence of reasoning models that can be run on consumer-grade hardware, we investigated whether these models can approximate the performance of much larger and costlier models.

METHODS

From 458 053 adults hospitalised at one of two academic medical centres between 4 January 2005 and 2 January 2014, we identified 1995 who died by suicide or accident, and matched them with 5 control individuals. We used Llama-DeepSeek-R1 8B to generate predictions of risk. Beyond discrimination and calibration, we examined the aspects of model reasoning-that is, the topics in the chain of thought-associated with correct or incorrect predictions.

FINDINGS

The cohort included 1995 individuals who died by suicide or accidental death and 9975 individuals matched 5:1, totalling 11 954 discharges and 58 933 person-years of follow-up. In Fine and Grey regression, hazard as estimated by the Llama3-distilled model was significantly associated with observed risk (unadjusted HR 4.65 (3.58-6.04)). The corresponding c-statistic was 0.64 (0.63-0.65), modestly poorer than the GPT4o model (0.67 (0.66-0.68)). In chain-of-thought reasoning, topics including Substance Abuse, Surgical Procedure, and Age-related Comorbidities were associated with correct predictions, while Fall-related Injury was associated with incorrect prediction.

CONCLUSIONS

Application of a reasoning model using local, consumer-grade hardware only modestly diminished performance in stratifying suicide risk.

CLINICAL IMPLICATIONS

Smaller models can yield more secure, scalable and transparent risk prediction.

摘要

背景

我们之前证明了一个大语言模型可以使用医院出院记录来估计自杀风险。

目的

随着可在消费级硬件上运行的推理模型的出现，我们研究了这些模型是否能接近更大且成本更高的模型的性能。

方法

从2005年1月4日至2014年1月2日在两个学术医疗中心之一住院的458053名成年人中，我们确定了1995名自杀或意外死亡者，并将他们与5名对照个体进行匹配。我们使用Llama-DeepSeek-R1 8B来生成风险预测。除了区分度和校准外，我们还检查了模型推理的方面，即与正确或错误预测相关的思维链中的主题。

结果

该队列包括1995名自杀或意外死亡个体以及9975名按5:1匹配的个体，总计11954份出院记录和58933人年的随访。在Fine和Grey回归中，Llama3蒸馏模型估计的风险与观察到的风险显著相关（未调整的风险比为4.65（3.58 - 6.04））。相应的c统计量为0.64（0.63 - 0.65），略逊于GPT4o模型（0.67（0.66 - 0.68））。在思维链推理中，包括药物滥用、外科手术和年龄相关合并症等主题与正确预测相关，而与跌倒相关的损伤与错误预测相关。

结论

使用本地消费级硬件的推理模型在分层自杀风险方面仅略微降低了性能。

临床意义

较小的模型可以产生更安全、可扩展且透明的风险预测。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3b0d/12067846/756c5380daba/bmjment-28-1-g001.jpg

相似文献

Reasoning language models for more transparent prediction of suicide risk.用于更透明地预测自杀风险的推理语言模型。

BMJ Ment Health. 2025 May 11;28(1):e301654. doi: 10.1136/bmjment-2025-301654.

Prevention of self-harm and suicide in young people up to the age of 25 in education settings.在教育环境中预防25岁及以下年轻人的自我伤害和自杀行为。

Cochrane Database Syst Rev. 2024 Dec 20;12(12):CD013844. doi: 10.1002/14651858.CD013844.pub2.

Adapting Safety Plans for Autistic Adults with Involvement from the Autism Community.在自闭症群体的参与下为成年自闭症患者调整安全计划。

Autism Adulthood. 2025 May 28;7(3):293-302. doi: 10.1089/aut.2023.0124. eCollection 2025 Jun.

Surveillance for Violent Deaths - National Violent Death Reporting System, 50 States, the District of Columbia, and Puerto Rico, 2022.暴力死亡监测——2022年全国暴力死亡报告系统，50个州、哥伦比亚特区和波多黎各

MMWR Surveill Summ. 2025 Jun 12;74(5):1-42. doi: 10.15585/mmwr.ss7405a1.

Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中，如果患者出现以下症状和体征，可判断其是否患有 COVID-19。

Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.

Cost-effectiveness of using prognostic information to select women with breast cancer for adjuvant systemic therapy.利用预后信息为乳腺癌患者选择辅助性全身治疗的成本效益

Health Technol Assess. 2006 Sep;10(34):iii-iv, ix-xi, 1-204. doi: 10.3310/hta10340.

Education support services for improving school engagement and academic performance of children and adolescents with a chronic health condition.改善患有慢性病的儿童和青少年的学校参与度和学业成绩的教育支持服务。

Cochrane Database Syst Rev. 2023 Feb 8;2(2):CD011538. doi: 10.1002/14651858.CD011538.pub2.

Falls prevention interventions for community-dwelling older adults: systematic review and meta-analysis of benefits, harms, and patient values and preferences.社区居住的老年人跌倒预防干预措施：系统评价和荟萃分析的益处、危害以及患者的价值观和偏好。

Syst Rev. 2024 Nov 26;13(1):289. doi: 10.1186/s13643-024-02681-3.

Psychological and/or educational interventions for the prevention of depression in children and adolescents.预防儿童和青少年抑郁症的心理和/或教育干预措施。

Cochrane Database Syst Rev. 2004(1):CD003380. doi: 10.1002/14651858.CD003380.pub2.

Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.系统性药理学治疗慢性斑块状银屑病：网络荟萃分析。

Cochrane Database Syst Rev. 2021 Apr 19;4(4):CD011535. doi: 10.1002/14651858.CD011535.pub4.

本文引用的文献

Zero Suicide Model Implementation and Suicide Attempt Rates in Outpatient Mental Health Care.门诊心理健康护理中的零自杀模式实施与自杀未遂率

JAMA Netw Open. 2025 Apr 1;8(4):e253721. doi: 10.1001/jamanetworkopen.2025.3721.

Environmental impact of large language models in medicine.医学中大型语言模型的环境影响。

Intern Med J. 2024 Dec;54(12):2083-2086. doi: 10.1111/imj.16549. Epub 2024 Nov 14.

Reconciling the contrasting narratives on the environmental impact of large language models.调和关于大型语言模型环境影响的相互矛盾的说法。

Sci Rep. 2024 Nov 1;14(1):26310. doi: 10.1038/s41598-024-76682-6.

The Leading Causes of Death in the US for 2020.2020年美国的主要死因。

JAMA. 2021 May 11;325(18):1829-1830. doi: 10.1001/jama.2021.5469.

Hard for humans, hard for machines: predicting readmission after psychiatric hospitalization using narrative notes.对人类和机器来说都困难：使用叙事笔记预测精神科住院后的再入院情况。

Transl Psychiatry. 2021 Jan 11;11(1):32. doi: 10.1038/s41398-020-01104-w.

Hard Truths About Suicide Prevention.关于自杀预防的残酷真相。

JAMA Netw Open. 2020 Oct 1;3(10):e2022713. doi: 10.1001/jamanetworkopen.2020.22713.

Stratifying risk for dementia onset using large-scale electronic health record data: A retrospective cohort study.利用大规模电子健康记录数据对痴呆发病风险进行分层：一项回顾性队列研究。

Alzheimers Dement. 2020 Mar;16(3):531-540. doi: 10.1016/j.jalz.2019.09.084. Epub 2020 Jan 16.

Suicidal Risk Following Hospital Discharge: A Review.出院后自杀风险：综述。

Harv Rev Psychiatry. 2019 Jul/Aug;27(4):209-216. doi: 10.1097/HRP.0000000000000222.

Research Domain Criteria scores estimated through natural language processing are associated with risk for suicide and accidental death.通过自然语言处理估计的研究领域标准评分与自杀和意外死亡风险相关。

Depress Anxiety. 2019 May;36(5):392-399. doi: 10.1002/da.22882. Epub 2019 Feb 2.

Preserving Patient Confidentiality as Data Grow: Implications of the Ability to Reidentify Physical Activity Data.随着数据量增长保护患者隐私：重新识别身体活动数据能力的影响

JAMA Netw Open. 2018 Dec 7;1(8):e186029. doi: 10.1001/jamanetworkopen.2018.6029.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

用于更透明地预测自杀风险的推理语言模型。

Reasoning language models for more transparent prediction of suicide risk.

作者信息

机构信息

出版信息

BACKGROUND

OBJECTIVE

METHODS

FINDINGS

CONCLUSIONS

CLINICAL IMPLICATIONS

背景

目的

方法

结果

结论

临床意义

相似文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

本文引用的文献