通过ChatGPT-3.5与ChatGPT-4视角进行的自杀风险评估：案例研究

Suicide Risk Assessments Through the Eyes of ChatGPT-3.5 Versus ChatGPT-4: Vignette Study.

作者信息

Levkovich Inbar, Elyoseph Zohar

机构信息

Oranim Academic College, Faculty of Graduate Studies, Kiryat Tivon, Israel.

Department of Psychology and Educational Counseling, The Center for Psychobiological Research, Max Stern Yezreel Valley College, Emek Yezreel, Israel.

出版信息

JMIR Ment Health. 2023 Sep 20;10:e51232. doi: 10.2196/51232.

DOI:10.2196/51232

PMID:37728984

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10551796/

Abstract

BACKGROUND

ChatGPT, a linguistic artificial intelligence (AI) model engineered by OpenAI, offers prospective contributions to mental health professionals. Although having significant theoretical implications, ChatGPT's practical capabilities, particularly regarding suicide prevention, have not yet been substantiated.

OBJECTIVE

The study's aim was to evaluate ChatGPT's ability to assess suicide risk, taking into consideration 2 discernable factors-perceived burdensomeness and thwarted belongingness-over a 2-month period. In addition, we evaluated whether ChatGPT-4 more accurately evaluated suicide risk than did ChatGPT-3.5.

METHODS

ChatGPT was tasked with assessing a vignette that depicted a hypothetical patient exhibiting differing degrees of perceived burdensomeness and thwarted belongingness. The assessments generated by ChatGPT were subsequently contrasted with standard evaluations rendered by mental health professionals. Using both ChatGPT-3.5 and ChatGPT-4 (May 24, 2023), we executed 3 evaluative procedures in June and July 2023. Our intent was to scrutinize ChatGPT-4's proficiency in assessing various facets of suicide risk in relation to the evaluative abilities of both mental health professionals and an earlier version of ChatGPT-3.5 (March 14 version).

RESULTS

During the period of June and July 2023, we found that the likelihood of suicide attempts as evaluated by ChatGPT-4 was similar to the norms of mental health professionals (n=379) under all conditions (average Z score of 0.01). Nonetheless, a pronounced discrepancy was observed regarding the assessments performed by ChatGPT-3.5 (May version), which markedly underestimated the potential for suicide attempts, in comparison to the assessments carried out by the mental health professionals (average Z score of -0.83). The empirical evidence suggests that ChatGPT-4's evaluation of the incidence of suicidal ideation and psychache was higher than that of the mental health professionals (average Z score of 0.47 and 1.00, respectively). Conversely, the level of resilience as assessed by both ChatGPT-4 and ChatGPT-3.5 (both versions) was observed to be lower in comparison to the assessments offered by mental health professionals (average Z score of -0.89 and -0.90, respectively).

CONCLUSIONS

The findings suggest that ChatGPT-4 estimates the likelihood of suicide attempts in a manner akin to evaluations provided by professionals. In terms of recognizing suicidal ideation, ChatGPT-4 appears to be more precise. However, regarding psychache, there was an observed overestimation by ChatGPT-4, indicating a need for further research. These results have implications regarding ChatGPT-4's potential to support gatekeepers, patients, and even mental health professionals' decision-making. Despite the clinical potential, intensive follow-up studies are necessary to establish the use of ChatGPT-4's capabilities in clinical practice. The finding that ChatGPT-3.5 frequently underestimates suicide risk, especially in severe cases, is particularly troubling. It indicates that ChatGPT may downplay one's actual suicide risk level.

摘要

背景

ChatGPT是OpenAI研发的一种语言人工智能（AI）模型，有望为心理健康专业人员做出贡献。尽管具有重大的理论意义，但ChatGPT的实际能力，特别是在自杀预防方面，尚未得到证实。

目的

本研究旨在评估ChatGPT在两个月的时间内评估自杀风险的能力，同时考虑两个可辨别的因素——感知到的负担感和归属感受挫。此外，我们还评估了ChatGPT-4是否比ChatGPT-3.5更准确地评估自杀风险。

方法

ChatGPT的任务是评估一个描述假设患者表现出不同程度的感知负担感和归属感受挫的案例。随后，将ChatGPT生成的评估结果与心理健康专业人员进行的标准评估进行对比。我们在2023年6月和7月使用ChatGPT-3.5和ChatGPT-4（2023年5月24日版本）执行了3项评估程序。我们的目的是审查ChatGPT-4在评估自杀风险各个方面的能力，与心理健康专业人员以及早期版本的ChatGPT-3.5（3月14日版本）的评估能力进行比较。

结果

在2023年6月和7月期间，我们发现ChatGPT-4评估的自杀未遂可能性在所有情况下都与心理健康专业人员（n = 379）的标准相似（平均Z分数为0.01）。然而，观察到ChatGPT-3.5（5月版本）进行的评估与心理健康专业人员进行的评估存在明显差异，ChatGPT-3.5明显低估了自杀未遂的可能性（平均Z分数为-0.83）。实证证据表明，ChatGPT-4对自杀意念和心理痛苦发生率的评估高于心理健康专业人员（平均Z分数分别为0.47和1.00）。相反，与心理健康专业人员提供的评估相比，ChatGPT-4和ChatGPT-3.5（两个版本）评估的恢复力水平较低（平均Z分数分别为-0.89和-0.90）。

结论

研究结果表明，ChatGPT-4评估自杀未遂可能性的方式与专业人员的评估方式类似。在识别自杀意念方面，ChatGPT-4似乎更准确。然而，关于心理痛苦，ChatGPT-4存在高估现象，这表明需要进一步研究。这些结果对ChatGPT-4支持守门人、患者甚至心理健康专业人员决策的潜力具有启示意义。尽管具有临床潜力，但需要进行深入的后续研究，以确定ChatGPT-4在临床实践中的应用能力。发现ChatGPT-3.5经常低估自杀风险，尤其是在严重情况下，这尤其令人担忧。这表明ChatGPT可能会淡化一个人的实际自杀风险水平。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c988/10551796/a07a7d0ed249/mental_v10i1e51232_fig1.jpg

相似文献

Suicide Risk Assessments Through the Eyes of ChatGPT-3.5 Versus ChatGPT-4: Vignette Study.

JMIR Ment Health. 2023 Sep 20;10:e51232. doi: 10.2196/51232.

Beyond human expertise: the promise and limitations of ChatGPT in suicide risk assessment.

Front Psychiatry. 2023 Aug 1;14:1213141. doi: 10.3389/fpsyt.2023.1213141. eCollection 2023.

Exploring the Potential of ChatGPT-4 in Predicting Refractive Surgery Categorizations: Comparative Study.

JMIR Form Res. 2023 Dec 28;7:e51798. doi: 10.2196/51798.

Integrating Previous Suicide Attempts, Gender, and Age Into Suicide Risk Assessment Using Advanced Artificial Intelligence Models.

J Clin Psychiatry. 2024 Oct 2;85(4):24m15365. doi: 10.4088/JCP.24m15365.

The effect of perceived burdensomeness and thwarted belongingness on therapists' assessment of patients' suicide risk.

Psychother Res. 2016 Jul;26(4):436-45. doi: 10.1080/10503307.2015.1013161. Epub 2015 Mar 9.

Performance of ChatGPT on the Chinese Postgraduate Examination for Clinical Medicine: Survey Study.

JMIR Med Educ. 2024 Feb 9;10:e48514. doi: 10.2196/48514.

Evaluating ChatGPT-4's Diagnostic Accuracy: Impact of Visual Data Integration.

JMIR Med Inform. 2024 Apr 9;12:e55627. doi: 10.2196/55627.

ChatGPT outperforms humans in emotional awareness evaluations.

Front Psychol. 2023 May 26;14:1199058. doi: 10.3389/fpsyg.2023.1199058. eCollection 2023.

Exploring the Performance of ChatGPT Versions 3.5, 4, and 4 With Vision in the Chilean Medical Licensing Examination: Observational Study.

JMIR Med Educ. 2024 Apr 29;10:e55048. doi: 10.2196/55048.

Comparing the Perspectives of Generative AI, Mental Health Experts, and the General Public on Schizophrenia Recovery: Case Vignette Study.

JMIR Ment Health. 2024 Mar 18;11:e53043. doi: 10.2196/53043.

引用本文的文献

Evaluating Generative Pretrained Transformer (GPT) models for suicide risk assessment in synthetic patient journal entries.

BMC Psychiatry. 2025 Aug 1;25(1):753. doi: 10.1186/s12888-025-07088-5.

Comparing ChatGPT and validated questionnaires in assessing loneliness and online social support among college students: a cross-sectional study.

Sci Rep. 2025 Jul 1;15(1):20621. doi: 10.1038/s41598-025-06358-2.

The Application and Ethical Implication of Generative AI in Mental Health: Systematic Review.

JMIR Ment Health. 2025 Jun 27;12:e70610. doi: 10.2196/70610.

A step toward the future? evaluating GenAI QPR simulation training for mental health gatekeepers.

Front Med (Lausanne). 2025 Jun 11;12:1599900. doi: 10.3389/fmed.2025.1599900. eCollection 2025.

Applying language models for suicide prevention: evaluating news article adherence to WHO reporting guidelines.

Npj Ment Health Res. 2025 Jun 20;4(1):25. doi: 10.1038/s44184-025-00139-5.

The role of generative artificial intelligence in evaluating adherence to responsible press media reports on suicide: A multisite, three-language study.

Eur Psychiatry. 2025 May 27;68(1):e81. doi: 10.1192/j.eurpsy.2025.10037.

Clinical insights: A comprehensive review of language models in medicine.

PLOS Digit Health. 2025 May 8;4(5):e0000800. doi: 10.1371/journal.pdig.0000800. eCollection 2025 May.

The Applications of Large Language Models in Mental Health: Scoping Review.

J Med Internet Res. 2025 May 5;27:e69284. doi: 10.2196/69284.

Effectiveness of generative AI-large language models' recognition of veteran suicide risk: a comparison with human mental health providers using a risk stratification model.

Front Psychiatry. 2025 Apr 3;16:1544951. doi: 10.3389/fpsyt.2025.1544951. eCollection 2025.

Exploring Biases of Large Language Models in the Field of Mental Health: Comparative Questionnaire Study of the Effect of Gender and Sexual Orientation in Anorexia Nervosa and Bulimia Nervosa Case Vignettes.

JMIR Ment Health. 2025 Mar 20;12:e57986. doi: 10.2196/57986.

本文引用的文献

Improved Performance of ChatGPT-4 on the OKAP Examination: A Comparative Study with ChatGPT-3.5.

J Acad Ophthalmol (2017). 2023 Sep 11;15(2):e184-e187. doi: 10.1055/s-0043-1774399. eCollection 2023 Jul.

Beyond human expertise: the promise and limitations of ChatGPT in suicide risk assessment.

Front Psychiatry. 2023 Aug 1;14:1213141. doi: 10.3389/fpsyt.2023.1213141. eCollection 2023.

ChatGPT and the Future of Digital Health: A Study on Healthcare Workers' Perceptions and Expectations.

Healthcare (Basel). 2023 Jun 21;11(13):1812. doi: 10.3390/healthcare11131812.

ChatGPT, GPT-4, and Other Large Language Models: The Next Revolution for Clinical Microbiology?

Clin Infect Dis. 2023 Nov 11;77(9):1322-1328. doi: 10.1093/cid/ciad407.

Reliability of Medical Information Provided by ChatGPT: Assessment Against Clinical Guidelines and Patient Information Quality Instrument.

J Med Internet Res. 2023 Jun 30;25:e47479. doi: 10.2196/47479.

Performance of GPT-3.5 and GPT-4 on the Japanese Medical Licensing Examination: Comparison Study.

JMIR Med Educ. 2023 Jun 29;9:e48002. doi: 10.2196/48002.

Chat GPT-4 significantly surpasses GPT-3.5 in drug information queries.

J Telemed Telecare. 2025 Feb;31(2):306-308. doi: 10.1177/1357633X231181922. Epub 2023 Jun 22.

ChatGPT outperforms humans in emotional awareness evaluations.

Front Psychol. 2023 May 26;14:1199058. doi: 10.3389/fpsyg.2023.1199058. eCollection 2023.

The Advent of Generative Language Models in Medical Education.

JMIR Med Educ. 2023 Jun 6;9:e48163. doi: 10.2196/48163.

User Intentions to Use ChatGPT for Self-Diagnosis and Health-Related Purposes: Cross-sectional Survey Study.

JMIR Hum Factors. 2023 May 17;10:e47564. doi: 10.2196/47564.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

通过ChatGPT-3.5与ChatGPT-4视角进行的自杀风险评估：案例研究

Suicide Risk Assessments Through the Eyes of ChatGPT-3.5 Versus ChatGPT-4: Vignette Study.

作者信息

机构信息

出版信息

BACKGROUND

OBJECTIVE

METHODS

RESULTS

CONCLUSIONS

背景

目的

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献