文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

评估生成式预训练变换器(GPT)模型在合成患者日志条目中进行自杀风险评估的效果。

Evaluating Generative Pretrained Transformer (GPT) models for suicide risk assessment in synthetic patient journal entries.

作者信息

Holley Dan, Daly Brian, Beverly Briana, Wamsley Blaken, Brooks Amanda, Zaubler Tom

机构信息

Clinical Operation, NeuroFlow, Philadelphia, PA, USA.

Drexel University, Philadelphia, PA, USA.

出版信息

BMC Psychiatry. 2025 Aug 1;25(1):753. doi: 10.1186/s12888-025-07088-5.


DOI:10.1186/s12888-025-07088-5
PMID:40750858
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12317552/
Abstract

Over 700,000 individuals die by suicide globally each year, with rapid progression from suicidal ideation (SI) to attempt often precluding opportunities for intervention. Digital behavioral health (DBH) platforms offer novel means of collecting SI indicators outside the clinic, but the actionable utility of these data may be limited by clinician-dependent workflows such as reviewing patients' journaling exercises for signs of SI. Large language models (LLMs) provide a methodology to streamline this task by rapidly risk-stratifying text based on the presence and severity of SI; however, this application has yet to be reliably evaluated. To test this approach, we first generated and validated a corpus of 125 synthetic journal responses to prompts from a real-world DBH platform. The responses varied on the presence and severity of suicidal ideation, readability, length, use of emojis, and other common language features, allowing for over 1 trillion feature permutations. Next, five collaborating behavioral health experts worked independently to stratify these responses as no-, low-, moderate-, or high-risk SI. Finally, we risk-stratified the responses using several tailored implementations of OpenAI's Generative Pretrained Transformer (GPT) models and compared the results to those of our raters. Using clinician consensus as "ground truth," our ensemble LLM performed significantly above chance (30.38%) in exact risk-assessment agreement (65.60%; χ2 = 86.58). The ensemble model also aligned with 92% of clinicians' "do/do not intervene" decisions (Cohen's Kappa = 0.84) and achieved 94% sensitivity and 91% specificity in that task. Additional results of precision-recall, time-to-decision, and cost analyses are reported. While further testing and exploration of ethical considerations remain critical, our results offer preliminary evidence that LLM-powered risk stratification can serve as a powerful and cost-effective tool to enhance suicide prevention frameworks.

摘要

全球每年有超过70万人死于自杀,从自杀意念(SI)到自杀未遂的快速发展常常使干预机会丧失。数字行为健康(DBH)平台提供了在诊所外收集SI指标的新方法,但这些数据的可操作效用可能受到依赖临床医生的工作流程的限制,例如查看患者的日志练习以寻找SI迹象。大语言模型(LLM)提供了一种方法,通过根据SI的存在和严重程度快速对文本进行风险分层来简化这项任务;然而,这种应用尚未得到可靠评估。为了测试这种方法,我们首先生成并验证了一个由125条合成日志回复组成的语料库,这些回复是针对一个真实世界DBH平台的提示生成的。这些回复在自杀意念的存在和严重程度、可读性、长度、表情符号的使用以及其他常见语言特征方面各不相同,允许有超过1万亿种特征排列。接下来,五位合作的行为健康专家独立工作,将这些回复分为无、低、中或高风险SI。最后,我们使用OpenAI的生成式预训练变换器(GPT)模型的几个定制实现对回复进行风险分层,并将结果与我们的评估者的结果进行比较。以临床医生的共识作为“地面真相”,我们的集成LLM在精确风险评估一致性方面(65.60%;χ2 = 86.58)显著高于随机水平(30.38%)。集成模型还与92%的临床医生的“干预/不干预”决策一致(科恩卡方系数 = 0.84),并在该任务中达到了94%的灵敏度和91%的特异性。报告了精确召回率、决策时间和成本分析的其他结果。虽然对伦理考量的进一步测试和探索仍然至关重要,但我们的结果提供了初步证据,表明基于LLM的风险分层可以作为一种强大且具有成本效益的工具,以加强自杀预防框架。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9f18/12317552/8776b880fb54/12888_2025_7088_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9f18/12317552/1a00d94b9986/12888_2025_7088_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9f18/12317552/eaffc8eb3980/12888_2025_7088_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9f18/12317552/af422202d120/12888_2025_7088_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9f18/12317552/8776b880fb54/12888_2025_7088_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9f18/12317552/1a00d94b9986/12888_2025_7088_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9f18/12317552/eaffc8eb3980/12888_2025_7088_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9f18/12317552/af422202d120/12888_2025_7088_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9f18/12317552/8776b880fb54/12888_2025_7088_Fig4_HTML.jpg

相似文献

[1]
Evaluating Generative Pretrained Transformer (GPT) models for suicide risk assessment in synthetic patient journal entries.

BMC Psychiatry. 2025-8-1

[2]
Prevention of self-harm and suicide in young people up to the age of 25 in education settings.

Cochrane Database Syst Rev. 2024-12-20

[3]
Improving Suicidal Ideation Detection in Social Media Posts: Topic Modeling and Synthetic Data Augmentation Approach.

JMIR Form Res. 2025-6-11

[4]
Use of Large Language Models to Classify Epidemiological Characteristics in Synthetic and Real-World Social Media Posts About Conjunctivitis Outbreaks: Infodemiology Study.

J Med Internet Res. 2025-7-2

[5]
Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.

Cochrane Database Syst Rev. 2022-5-20

[6]
Applications of Large Language Models in the Field of Suicide Prevention: Scoping Review.

J Med Internet Res. 2025-1-23

[7]
Why Are Autistic People More Likely to Experience Suicidal Thoughts? Applying the Integrated Motivational-Volitional Model with Autistic Adults.

Autism Adulthood. 2024-9-16

[8]
Sexual Harassment and Prevention Training

2025-1

[9]
Interventions to improve safe and effective medicines use by consumers: an overview of systematic reviews.

Cochrane Database Syst Rev. 2014-4-29

[10]
Technological aids for the rehabilitation of memory and executive functioning in children and adolescents with acquired brain injury.

Cochrane Database Syst Rev. 2016-7-1

本文引用的文献

[1]
mHealth-Augmented Care for Reducing Depression Symptom Severity Among Patients With Chronic Pain: Exploratory, Retrospective Cohort Study.

JMIR Mhealth Uhealth. 2025-1-10

[2]
Large Language Models Versus Expert Clinicians in Crisis Prediction Among Telemental Health Patients: Comparative Study.

JMIR Ment Health. 2024-8-2

[3]
Generative Artificial Intelligence to Transform Inpatient Discharge Summaries to Patient-Friendly Language and Format.

JAMA Netw Open. 2024-3-4

[4]
Lifetime Suicide Attempts in Otherwise Psychiatrically Healthy Individuals.

JAMA Psychiatry. 2024-6-1

[5]
Digital Phenotyping: Data-Driven Psychiatry to Redefine Mental Health.

J Med Internet Res. 2023-10-4

[6]
Suicide Risk Assessments Through the Eyes of ChatGPT-3.5 Versus ChatGPT-4: Vignette Study.

JMIR Ment Health. 2023-9-20

[7]
Beyond human expertise: the promise and limitations of ChatGPT in suicide risk assessment.

Front Psychiatry. 2023-8-1

[8]
Social vulnerability indices: a scoping review.

BMC Public Health. 2023-6-28

[9]
Social Vulnerability and Risk of Suicide in US Adults, 2016-2020.

JAMA Netw Open. 2023-4-3

[10]
Suicide Mortality in the United States, 2001-2021.

NCHS Data Brief. 2023-4

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索