• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过ChatGPT-3.5与ChatGPT-4视角进行的自杀风险评估:案例研究

Suicide Risk Assessments Through the Eyes of ChatGPT-3.5 Versus ChatGPT-4: Vignette Study.

作者信息

Levkovich Inbar, Elyoseph Zohar

机构信息

Oranim Academic College, Faculty of Graduate Studies, Kiryat Tivon, Israel.

Department of Psychology and Educational Counseling, The Center for Psychobiological Research, Max Stern Yezreel Valley College, Emek Yezreel, Israel.

出版信息

JMIR Ment Health. 2023 Sep 20;10:e51232. doi: 10.2196/51232.

DOI:10.2196/51232
PMID:37728984
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10551796/
Abstract

BACKGROUND

ChatGPT, a linguistic artificial intelligence (AI) model engineered by OpenAI, offers prospective contributions to mental health professionals. Although having significant theoretical implications, ChatGPT's practical capabilities, particularly regarding suicide prevention, have not yet been substantiated.

OBJECTIVE

The study's aim was to evaluate ChatGPT's ability to assess suicide risk, taking into consideration 2 discernable factors-perceived burdensomeness and thwarted belongingness-over a 2-month period. In addition, we evaluated whether ChatGPT-4 more accurately evaluated suicide risk than did ChatGPT-3.5.

METHODS

ChatGPT was tasked with assessing a vignette that depicted a hypothetical patient exhibiting differing degrees of perceived burdensomeness and thwarted belongingness. The assessments generated by ChatGPT were subsequently contrasted with standard evaluations rendered by mental health professionals. Using both ChatGPT-3.5 and ChatGPT-4 (May 24, 2023), we executed 3 evaluative procedures in June and July 2023. Our intent was to scrutinize ChatGPT-4's proficiency in assessing various facets of suicide risk in relation to the evaluative abilities of both mental health professionals and an earlier version of ChatGPT-3.5 (March 14 version).

RESULTS

During the period of June and July 2023, we found that the likelihood of suicide attempts as evaluated by ChatGPT-4 was similar to the norms of mental health professionals (n=379) under all conditions (average Z score of 0.01). Nonetheless, a pronounced discrepancy was observed regarding the assessments performed by ChatGPT-3.5 (May version), which markedly underestimated the potential for suicide attempts, in comparison to the assessments carried out by the mental health professionals (average Z score of -0.83). The empirical evidence suggests that ChatGPT-4's evaluation of the incidence of suicidal ideation and psychache was higher than that of the mental health professionals (average Z score of 0.47 and 1.00, respectively). Conversely, the level of resilience as assessed by both ChatGPT-4 and ChatGPT-3.5 (both versions) was observed to be lower in comparison to the assessments offered by mental health professionals (average Z score of -0.89 and -0.90, respectively).

CONCLUSIONS

The findings suggest that ChatGPT-4 estimates the likelihood of suicide attempts in a manner akin to evaluations provided by professionals. In terms of recognizing suicidal ideation, ChatGPT-4 appears to be more precise. However, regarding psychache, there was an observed overestimation by ChatGPT-4, indicating a need for further research. These results have implications regarding ChatGPT-4's potential to support gatekeepers, patients, and even mental health professionals' decision-making. Despite the clinical potential, intensive follow-up studies are necessary to establish the use of ChatGPT-4's capabilities in clinical practice. The finding that ChatGPT-3.5 frequently underestimates suicide risk, especially in severe cases, is particularly troubling. It indicates that ChatGPT may downplay one's actual suicide risk level.

摘要

背景

ChatGPT是OpenAI研发的一种语言人工智能(AI)模型,有望为心理健康专业人员做出贡献。尽管具有重大的理论意义,但ChatGPT的实际能力,特别是在自杀预防方面,尚未得到证实。

目的

本研究旨在评估ChatGPT在两个月的时间内评估自杀风险的能力,同时考虑两个可辨别的因素——感知到的负担感和归属感受挫。此外,我们还评估了ChatGPT-4是否比ChatGPT-3.5更准确地评估自杀风险。

方法

ChatGPT的任务是评估一个描述假设患者表现出不同程度的感知负担感和归属感受挫的案例。随后,将ChatGPT生成的评估结果与心理健康专业人员进行的标准评估进行对比。我们在2023年6月和7月使用ChatGPT-3.5和ChatGPT-4(2023年5月24日版本)执行了3项评估程序。我们的目的是审查ChatGPT-4在评估自杀风险各个方面的能力,与心理健康专业人员以及早期版本的ChatGPT-3.5(3月14日版本)的评估能力进行比较。

结果

在2023年6月和7月期间,我们发现ChatGPT-4评估的自杀未遂可能性在所有情况下都与心理健康专业人员(n = 379)的标准相似(平均Z分数为0.01)。然而,观察到ChatGPT-3.5(5月版本)进行的评估与心理健康专业人员进行的评估存在明显差异,ChatGPT-3.5明显低估了自杀未遂的可能性(平均Z分数为-0.83)。实证证据表明,ChatGPT-4对自杀意念和心理痛苦发生率的评估高于心理健康专业人员(平均Z分数分别为0.47和1.00)。相反,与心理健康专业人员提供的评估相比,ChatGPT-4和ChatGPT-3.5(两个版本)评估的恢复力水平较低(平均Z分数分别为-0.89和-0.90)。

结论

研究结果表明,ChatGPT-4评估自杀未遂可能性的方式与专业人员的评估方式类似。在识别自杀意念方面,ChatGPT-4似乎更准确。然而,关于心理痛苦,ChatGPT-4存在高估现象,这表明需要进一步研究。这些结果对ChatGPT-4支持守门人、患者甚至心理健康专业人员决策的潜力具有启示意义。尽管具有临床潜力,但需要进行深入的后续研究,以确定ChatGPT-4在临床实践中的应用能力。发现ChatGPT-3.5经常低估自杀风险,尤其是在严重情况下,这尤其令人担忧。这表明ChatGPT可能会淡化一个人的实际自杀风险水平。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c988/10551796/14370bdaa24d/mental_v10i1e51232_fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c988/10551796/a07a7d0ed249/mental_v10i1e51232_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c988/10551796/4107426fb3e7/mental_v10i1e51232_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c988/10551796/5f67e8942ad7/mental_v10i1e51232_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c988/10551796/14370bdaa24d/mental_v10i1e51232_fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c988/10551796/a07a7d0ed249/mental_v10i1e51232_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c988/10551796/4107426fb3e7/mental_v10i1e51232_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c988/10551796/5f67e8942ad7/mental_v10i1e51232_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c988/10551796/14370bdaa24d/mental_v10i1e51232_fig4.jpg

相似文献

1
Suicide Risk Assessments Through the Eyes of ChatGPT-3.5 Versus ChatGPT-4: Vignette Study.通过ChatGPT-3.5与ChatGPT-4视角进行的自杀风险评估:案例研究
JMIR Ment Health. 2023 Sep 20;10:e51232. doi: 10.2196/51232.
2
Beyond human expertise: the promise and limitations of ChatGPT in suicide risk assessment.超越人类专业知识:ChatGPT在自杀风险评估中的前景与局限
Front Psychiatry. 2023 Aug 1;14:1213141. doi: 10.3389/fpsyt.2023.1213141. eCollection 2023.
3
Exploring the Potential of ChatGPT-4 in Predicting Refractive Surgery Categorizations: Comparative Study.探索ChatGPT-4在预测屈光手术分类中的潜力:比较研究
JMIR Form Res. 2023 Dec 28;7:e51798. doi: 10.2196/51798.
4
Integrating Previous Suicide Attempts, Gender, and Age Into Suicide Risk Assessment Using Advanced Artificial Intelligence Models.利用先进人工智能模型将既往自杀尝试、性别和年龄纳入自杀风险评估。
J Clin Psychiatry. 2024 Oct 2;85(4):24m15365. doi: 10.4088/JCP.24m15365.
5
The effect of perceived burdensomeness and thwarted belongingness on therapists' assessment of patients' suicide risk.感知到的负担感和归属感受挫对治疗师评估患者自杀风险的影响。
Psychother Res. 2016 Jul;26(4):436-45. doi: 10.1080/10503307.2015.1013161. Epub 2015 Mar 9.
6
Performance of ChatGPT on the Chinese Postgraduate Examination for Clinical Medicine: Survey Study.ChatGPT 在临床医学研究生入学考试中的表现:调查研究。
JMIR Med Educ. 2024 Feb 9;10:e48514. doi: 10.2196/48514.
7
Evaluating ChatGPT-4's Diagnostic Accuracy: Impact of Visual Data Integration.评估ChatGPT-4的诊断准确性:视觉数据整合的影响。
JMIR Med Inform. 2024 Apr 9;12:e55627. doi: 10.2196/55627.
8
ChatGPT outperforms humans in emotional awareness evaluations.ChatGPT在情绪感知评估方面表现优于人类。
Front Psychol. 2023 May 26;14:1199058. doi: 10.3389/fpsyg.2023.1199058. eCollection 2023.
9
Exploring the Performance of ChatGPT Versions 3.5, 4, and 4 With Vision in the Chilean Medical Licensing Examination: Observational Study.探讨 ChatGPT 版本 3.5、4 和 4 与 Vision 在智利医师执照考试中的表现:观察性研究。
JMIR Med Educ. 2024 Apr 29;10:e55048. doi: 10.2196/55048.
10
Comparing the Perspectives of Generative AI, Mental Health Experts, and the General Public on Schizophrenia Recovery: Case Vignette Study.生成式人工智能、心理健康专家和公众对精神分裂症康复的看法比较:案例情节研究。
JMIR Ment Health. 2024 Mar 18;11:e53043. doi: 10.2196/53043.

引用本文的文献

1
Evaluating Generative Pretrained Transformer (GPT) models for suicide risk assessment in synthetic patient journal entries.评估生成式预训练变换器(GPT)模型在合成患者日志条目中进行自杀风险评估的效果。
BMC Psychiatry. 2025 Aug 1;25(1):753. doi: 10.1186/s12888-025-07088-5.
2
Comparing ChatGPT and validated questionnaires in assessing loneliness and online social support among college students: a cross-sectional study.比较ChatGPT与经过验证的问卷在评估大学生孤独感和在线社交支持方面的作用:一项横断面研究。
Sci Rep. 2025 Jul 1;15(1):20621. doi: 10.1038/s41598-025-06358-2.
3
The Application and Ethical Implication of Generative AI in Mental Health: Systematic Review.

本文引用的文献

1
Improved Performance of ChatGPT-4 on the OKAP Examination: A Comparative Study with ChatGPT-3.5.ChatGPT-4在医师执照考试(OKAP)中的表现提升:与ChatGPT-3.5的对比研究
J Acad Ophthalmol (2017). 2023 Sep 11;15(2):e184-e187. doi: 10.1055/s-0043-1774399. eCollection 2023 Jul.
2
Beyond human expertise: the promise and limitations of ChatGPT in suicide risk assessment.超越人类专业知识:ChatGPT在自杀风险评估中的前景与局限
Front Psychiatry. 2023 Aug 1;14:1213141. doi: 10.3389/fpsyt.2023.1213141. eCollection 2023.
3
ChatGPT and the Future of Digital Health: A Study on Healthcare Workers' Perceptions and Expectations.
生成式人工智能在心理健康领域的应用及伦理意义:系统综述
JMIR Ment Health. 2025 Jun 27;12:e70610. doi: 10.2196/70610.
4
A step toward the future? evaluating GenAI QPR simulation training for mental health gatekeepers.迈向未来的一步?评估针对心理健康守门人的生成式人工智能QPR模拟训练。
Front Med (Lausanne). 2025 Jun 11;12:1599900. doi: 10.3389/fmed.2025.1599900. eCollection 2025.
5
Applying language models for suicide prevention: evaluating news article adherence to WHO reporting guidelines.应用语言模型预防自杀:评估新闻文章对世界卫生组织报告指南的遵循情况。
Npj Ment Health Res. 2025 Jun 20;4(1):25. doi: 10.1038/s44184-025-00139-5.
6
The role of generative artificial intelligence in evaluating adherence to responsible press media reports on suicide: A multisite, three-language study.生成式人工智能在评估对负责任的自杀新闻媒体报道的依从性方面的作用:一项多地点、三种语言的研究。
Eur Psychiatry. 2025 May 27;68(1):e81. doi: 10.1192/j.eurpsy.2025.10037.
7
Clinical insights: A comprehensive review of language models in medicine.临床见解:医学领域语言模型的全面综述
PLOS Digit Health. 2025 May 8;4(5):e0000800. doi: 10.1371/journal.pdig.0000800. eCollection 2025 May.
8
The Applications of Large Language Models in Mental Health: Scoping Review.大语言模型在心理健康领域的应用:范围综述
J Med Internet Res. 2025 May 5;27:e69284. doi: 10.2196/69284.
9
Effectiveness of generative AI-large language models' recognition of veteran suicide risk: a comparison with human mental health providers using a risk stratification model.生成式人工智能大语言模型识别退伍军人自杀风险的有效性:与使用风险分层模型的人类心理健康提供者的比较。
Front Psychiatry. 2025 Apr 3;16:1544951. doi: 10.3389/fpsyt.2025.1544951. eCollection 2025.
10
Exploring Biases of Large Language Models in the Field of Mental Health: Comparative Questionnaire Study of the Effect of Gender and Sexual Orientation in Anorexia Nervosa and Bulimia Nervosa Case Vignettes.探索大语言模型在心理健康领域的偏差:神经性厌食症和神经性贪食症病例 vignettes 中性别和性取向影响的比较问卷调查研究。
JMIR Ment Health. 2025 Mar 20;12:e57986. doi: 10.2196/57986.
ChatGPT与数字健康的未来:一项关于医护人员认知与期望的研究。
Healthcare (Basel). 2023 Jun 21;11(13):1812. doi: 10.3390/healthcare11131812.
4
ChatGPT, GPT-4, and Other Large Language Models: The Next Revolution for Clinical Microbiology?ChatGPT、GPT-4 和其他大型语言模型:临床微生物学的下一次革命?
Clin Infect Dis. 2023 Nov 11;77(9):1322-1328. doi: 10.1093/cid/ciad407.
5
Reliability of Medical Information Provided by ChatGPT: Assessment Against Clinical Guidelines and Patient Information Quality Instrument.ChatGPT 提供的医学信息的可靠性:与临床指南和患者信息质量工具的评估。
J Med Internet Res. 2023 Jun 30;25:e47479. doi: 10.2196/47479.
6
Performance of GPT-3.5 and GPT-4 on the Japanese Medical Licensing Examination: Comparison Study.GPT-3.5和GPT-4在日本医师执照考试中的表现:比较研究。
JMIR Med Educ. 2023 Jun 29;9:e48002. doi: 10.2196/48002.
7
Chat GPT-4 significantly surpasses GPT-3.5 in drug information queries.Chat GPT-4在药物信息查询方面显著超越了GPT-3.5。
J Telemed Telecare. 2025 Feb;31(2):306-308. doi: 10.1177/1357633X231181922. Epub 2023 Jun 22.
8
ChatGPT outperforms humans in emotional awareness evaluations.ChatGPT在情绪感知评估方面表现优于人类。
Front Psychol. 2023 May 26;14:1199058. doi: 10.3389/fpsyg.2023.1199058. eCollection 2023.
9
The Advent of Generative Language Models in Medical Education.生成式语言模型在医学教育中的出现。
JMIR Med Educ. 2023 Jun 6;9:e48163. doi: 10.2196/48163.
10
User Intentions to Use ChatGPT for Self-Diagnosis and Health-Related Purposes: Cross-sectional Survey Study.用户使用ChatGPT进行自我诊断及与健康相关目的的意图:横断面调查研究。
JMIR Hum Factors. 2023 May 17;10:e47564. doi: 10.2196/47564.