• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

相似文献

1
Large Language Models Versus Expert Clinicians in Crisis Prediction Among Telemental Health Patients: Comparative Study.大语言模型与专家临床医生在远程心理健康患者危机预测中的比较研究。
JMIR Ment Health. 2024 Aug 2;11:e58129. doi: 10.2196/58129.
2
Quality of Answers of Generative Large Language Models Versus Peer Users for Interpreting Laboratory Test Results for Lay Patients: Evaluation Study.生成式大语言模型与同行用户对解释非专业患者实验室检测结果的答案质量比较:评估研究。
J Med Internet Res. 2024 Apr 17;26:e56655. doi: 10.2196/56655.
3
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
4
Learning to Make Rare and Complex Diagnoses With Generative AI Assistance: Qualitative Study of Popular Large Language Models.利用生成式人工智能辅助学习罕见且复杂的诊断:对流行的大型语言模型的定性研究。
JMIR Med Educ. 2024 Feb 13;10:e51391. doi: 10.2196/51391.
5
Evaluating Large Language Models for the National Premedical Exam in India: Comparative Analysis of GPT-3.5, GPT-4, and Bard.评估印度全国医预考用大型语言模型:GPT-3.5、GPT-4 和 Bard 的比较分析。
JMIR Med Educ. 2024 Feb 21;10:e51523. doi: 10.2196/51523.
6
Assessing the Alignment of Large Language Models With Human Values for Mental Health Integration: Cross-Sectional Study Using Schwartz's Theory of Basic Values.评估大型语言模型与人类心理健康整合价值观的一致性:使用施瓦茨基本价值观理论的横断面研究。
JMIR Ment Health. 2024 Apr 9;11:e55988. doi: 10.2196/55988.
7
Comparative Evaluation of LLMs in Clinical Oncology.临床肿瘤学中大型语言模型的比较评估
NEJM AI. 2024 May;1(5). doi: 10.1056/aioa2300151. Epub 2024 Apr 16.
8
Leveraging Large Language Models for Precision Monitoring of Chemotherapy-Induced Toxicities: A Pilot Study with Expert Comparisons and Future Directions.利用大语言模型进行化疗诱导毒性的精准监测:一项专家比较及未来方向的试点研究
Cancers (Basel). 2024 Aug 12;16(16):2830. doi: 10.3390/cancers16162830.
9
Artificial Intelligence for Anesthesiology Board-Style Examination Questions: Role of Large Language Models.人工智能在麻醉学 board 式考试问题中的应用:大语言模型的作用。
J Cardiothorac Vasc Anesth. 2024 May;38(5):1251-1259. doi: 10.1053/j.jvca.2024.01.032. Epub 2024 Feb 1.
10
A Generative Pretrained Transformer (GPT)-Powered Chatbot as a Simulated Patient to Practice History Taking: Prospective, Mixed Methods Study.基于生成式预训练转换器(GPT)的聊天机器人作为模拟患者进行病史采集的实践研究:前瞻性混合方法研究。
JMIR Med Educ. 2024 Jan 16;10:e53961. doi: 10.2196/53961.

引用本文的文献

1
Evaluating Generative Pretrained Transformer (GPT) models for suicide risk assessment in synthetic patient journal entries.评估生成式预训练变换器(GPT)模型在合成患者日志条目中进行自杀风险评估的效果。
BMC Psychiatry. 2025 Aug 1;25(1):753. doi: 10.1186/s12888-025-07088-5.
2
Potential of ChatGPT in youth mental health emergency triage: Comparative analysis with clinicians.ChatGPT在青少年心理健康紧急分诊中的潜力:与临床医生的比较分析
PCN Rep. 2025 Jul 15;4(3):e70159. doi: 10.1002/pcn5.70159. eCollection 2025 Sep.
3
Sentiment analysis in public health: a systematic review of the current state, challenges, and future directions.公共卫生中的情感分析:对当前状况、挑战及未来方向的系统综述
Front Public Health. 2025 Jun 20;13:1609749. doi: 10.3389/fpubh.2025.1609749. eCollection 2025.
4
The Application and Ethical Implication of Generative AI in Mental Health: Systematic Review.生成式人工智能在心理健康领域的应用及伦理意义:系统综述
JMIR Ment Health. 2025 Jun 27;12:e70610. doi: 10.2196/70610.
5
A Comparison of Responses from Human Therapists and Large Language Model-Based Chatbots to Assess Therapeutic Communication: Mixed Methods Study.比较人类治疗师和基于大语言模型的聊天机器人的反应以评估治疗性沟通:混合方法研究
JMIR Ment Health. 2025 May 21;12:e69709. doi: 10.2196/69709.
6
The evolving field of digital mental health: current evidence and implementation issues for smartphone apps, generative artificial intelligence, and virtual reality.数字心理健康的发展领域:智能手机应用程序、生成式人工智能和虚拟现实的当前证据及实施问题
World Psychiatry. 2025 Jun;24(2):156-174. doi: 10.1002/wps.21299.
7
The Applications of Large Language Models in Mental Health: Scoping Review.大语言模型在心理健康领域的应用:范围综述
J Med Internet Res. 2025 May 5;27:e69284. doi: 10.2196/69284.
8
Responsible Design, Integration, and Use of Generative AI in Mental Health.生成式人工智能在心理健康领域的负责任设计、整合与应用。
JMIR Ment Health. 2025 Jan 20;12:e70439. doi: 10.2196/70439.

本文引用的文献

1
A framework for language technologies in behavioral research and clinical applications: Ethical challenges, implications, and solutions.行为研究和临床应用中的语言技术框架:伦理挑战、影响和解决方案。
Am Psychol. 2024 Jan;79(1):79-91. doi: 10.1037/amp0001195.
2
Natural language processing system for rapid detection and intervention of mental health crisis chat messages.用于快速检测和干预心理健康危机聊天信息的自然语言处理系统。
NPJ Digit Med. 2023 Nov 21;6(1):213. doi: 10.1038/s41746-023-00951-3.
3
Large language models encode clinical knowledge.大语言模型编码临床知识。
Nature. 2023 Aug;620(7972):172-180. doi: 10.1038/s41586-023-06291-2. Epub 2023 Jul 12.
4
Application of Natural Language Processing (NLP) in Detecting and Preventing Suicide Ideation: A Systematic Review.自然语言处理(NLP)在检测和预防自杀意念中的应用:系统综述。
Int J Environ Res Public Health. 2023 Jan 13;20(2):1514. doi: 10.3390/ijerph20021514.
5
A Call to Action on Assessing and Mitigating Bias in Artificial Intelligence Applications for Mental Health.呼吁重视并减轻人工智能应用于精神健康领域中的偏见
Perspect Psychol Sci. 2023 Sep;18(5):1062-1096. doi: 10.1177/17456916221134490. Epub 2022 Dec 9.
6
Social isolation and suicide risk: Literature review and perspectives.社会隔离与自杀风险:文献综述与展望。
Eur Psychiatry. 2022 Oct 11;65(1):e65. doi: 10.1192/j.eurpsy.2022.2320.
7
Feasibility and acceptability of a novel telepsychiatry-delivered precision prescribing intervention for anxiety and depression.一种新型的通过远程精神病学提供的针对焦虑和抑郁的精准处方干预措施的可行性和可接受性。
BMC Psychiatry. 2022 Jul 19;22(1):483. doi: 10.1186/s12888-022-04113-9.
8
Reflections on Suicidal Ideation.关于自杀意念的思考
Crisis. 2019 Jul;40(4):227-230. doi: 10.1027/0227-5910/a000615.
9
Identification of suicidal behavior among psychiatrically hospitalized adolescents using natural language processing and machine learning of electronic health records.使用电子健康记录的自然语言处理和机器学习识别精神科住院青少年的自杀行为。
PLoS One. 2019 Feb 19;14(2):e0211116. doi: 10.1371/journal.pone.0211116. eCollection 2019.
10
Predictors of re-attempt in a cohort of suicide attempters: A survival analysis.自杀未遂者队列中再尝试的预测因素:生存分析。
J Affect Disord. 2019 Mar 15;247:20-28. doi: 10.1016/j.jad.2018.12.050. Epub 2018 Dec 18.

大语言模型与专家临床医生在远程心理健康患者危机预测中的比较研究。

Large Language Models Versus Expert Clinicians in Crisis Prediction Among Telemental Health Patients: Comparative Study.

机构信息

Brightside Health, San Francisco, CA, United States.

出版信息

JMIR Ment Health. 2024 Aug 2;11:e58129. doi: 10.2196/58129.

DOI:10.2196/58129
PMID:38876484
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11329850/
Abstract

BACKGROUND

Due to recent advances in artificial intelligence, large language models (LLMs) have emerged as a powerful tool for a variety of language-related tasks, including sentiment analysis, and summarization of provider-patient interactions. However, there is limited research on these models in the area of crisis prediction.

OBJECTIVE

This study aimed to evaluate the performance of LLMs, specifically OpenAI's generative pretrained transformer 4 (GPT-4), in predicting current and future mental health crisis episodes using patient-provided information at intake among users of a national telemental health platform.

METHODS

Deidentified patient-provided data were pulled from specific intake questions of the Brightside telehealth platform, including the chief complaint, for 140 patients who indicated suicidal ideation (SI), and another 120 patients who later indicated SI with a plan during the course of treatment. Similar data were pulled for 200 randomly selected patients, treated during the same time period, who never endorsed SI. In total, 6 senior Brightside clinicians (3 psychologists and 3 psychiatrists) were shown patients' self-reported chief complaint and self-reported suicide attempt history but were blinded to the future course of treatment and other reported symptoms, including SI. They were asked a simple yes or no question regarding their prediction of endorsement of SI with plan, along with their confidence level about the prediction. GPT-4 was provided with similar information and asked to answer the same questions, enabling us to directly compare the performance of artificial intelligence and clinicians.

RESULTS

Overall, the clinicians' average precision (0.7) was higher than that of GPT-4 (0.6) in identifying the SI with plan at intake (n=140) versus no SI (n=200) when using the chief complaint alone, while sensitivity was higher for the GPT-4 (0.62) than the clinicians' average (0.53). The addition of suicide attempt history increased the clinicians' average sensitivity (0.59) and precision (0.77) while increasing the GPT-4 sensitivity (0.59) but decreasing the GPT-4 precision (0.54). Performance decreased comparatively when predicting future SI with plan (n=120) versus no SI (n=200) with a chief complaint only for the clinicians (average sensitivity=0.4; average precision=0.59) and the GPT-4 (sensitivity=0.46; precision=0.48). The addition of suicide attempt history increased performance comparatively for the clinicians (average sensitivity=0.46; average precision=0.69) and the GPT-4 (sensitivity=0.74; precision=0.48).

CONCLUSIONS

GPT-4, with a simple prompt design, produced results on some metrics that approached those of a trained clinician. Additional work must be done before such a model can be piloted in a clinical setting. The model should undergo safety checks for bias, given evidence that LLMs can perpetuate the biases of the underlying data on which they are trained. We believe that LLMs hold promise for augmenting the identification of higher-risk patients at intake and potentially delivering more timely care to patients.

摘要

背景

由于人工智能的最新进展,大型语言模型(LLM)已成为各种语言相关任务的强大工具,包括情感分析和医患互动的总结。然而,在危机预测领域,对这些模型的研究有限。

目的

本研究旨在评估 LLM,特别是 OpenAI 的生成式预训练变压器 4(GPT-4),在使用全国远程心理健康平台患者提供的信息预测当前和未来心理健康危机发作方面的表现。

方法

从 Brightside 远程医疗平台的特定入学问题中提取了 140 名表示有自杀意念(SI)的患者和另外 120 名在治疗过程中表示有自杀计划的患者的患者提供的数据,以及 200 名在同一时期接受治疗但从未表示过 SI 的随机选择的患者。共有 6 名高级 Brightside 临床医生(3 名心理学家和 3 名精神科医生)查看了患者的自我报告的主要投诉和自我报告的自杀尝试史,但对未来的治疗过程和其他报告的症状(包括 SI)一无所知。他们被问到一个简单的是或否问题,即他们是否预测会出现有计划的 SI,以及他们对预测的信心水平。向 GPT-4 提供了类似的信息,并要求回答相同的问题,使我们能够直接比较人工智能和临床医生的表现。

结果

总体而言,当仅使用主要投诉时,临床医生的平均准确率(0.7)高于 GPT-4(0.6),用于识别有计划的 SI(n=140)与无 SI(n=200),而 GPT-4 的敏感性(0.62)高于临床医生的平均水平(0.53)。自杀尝试史的增加提高了临床医生的平均敏感性(0.59)和准确性(0.77),同时提高了 GPT-4 的敏感性(0.59),但降低了 GPT-4 的准确性(0.54)。对于仅使用主要投诉预测未来有计划的 SI(n=120)与无 SI(n=200)的临床医生(平均敏感性=0.4;平均准确性=0.59)和 GPT-4(敏感性=0.46;准确性=0.48),性能相对下降。自杀尝试史的增加提高了临床医生(平均敏感性=0.46;平均准确性=0.69)和 GPT-4(敏感性=0.74;准确性=0.48)的性能。

结论

GPT-4 通过简单的提示设计,在一些指标上取得了接近训练有素的临床医生的结果。在临床环境中试用之前,还需要做更多的工作。鉴于有证据表明,大型语言模型可能会延续其训练数据中的偏见,因此该模型应进行安全性检查,以防止偏见。我们相信,大型语言模型有可能在患者入学时增强对高风险患者的识别,并有可能为患者提供更及时的护理。