The Louis and Gabi Weisfeld School of Social Work, Bar-Ilan University, Ramat Gan, Israel.
Department of Counseling and Human Development, Faculty of Education, University of Haifa, Haifa, Israel.
J Clin Psychiatry. 2024 Oct 2;85(4):24m15365. doi: 10.4088/JCP.24m15365.
Suicide is a critical global health concern. Research indicates that generative artificial intelligence (GenAI) and large language models, such as generative pretrained transformer-3 (GPT-3) and GPT-4, can evaluate suicide risk comparably to experts, yet the criteria these models use are unclear. This study explores how variations in prompts, specifically regarding past suicide attempts, gender, and age, influence the risk assessments provided by ChatGPT-3 and ChatGPT-4. Using a controlled scenario based approach, 8 vignettes were created. Both ChatGPT-3.5 and ChatGPT 4 were used to predict the likelihood of serious suicide attempts, suicide attempts, and suicidal thoughts. A univariate 3-way analysis of variance was conducted to analyze the effects of the independent variables (previous suicide attempts, gender, and age) on the dependent variables (likelihood of serious suicide attempts, suicide attempts, and suicidal thoughts). Both ChatGPT-3.5 and ChatGPT-4 recognized the importance of previous suicide attempts in predicting severe suicide risks and suicidal thoughts. ChatGPT-4 also identified gender differences, associating men with a higher risk, while both models disregarded age as a risk factor. Interaction analysis revealed that ChatGPT-3.5 associated past attempts with a higher likelihood of suicidal thoughts in men, whereas ChatGPT-4 showed an increased risk for women. The study highlights ChatGPT-3.5 and ChatGPT-4's potential in suicide risk evaluation, emphasizing the importance of prior attempts and gender, while noting differences in their handling of interactive effects and the negligible role of age. These findings reflect the complexity of GenAI decision-making. While promising for suicide risk assessment, these models require careful application due to limitations and real-world complexities.
自杀是一个严重的全球健康问题。研究表明,生成式人工智能(GenAI)和大型语言模型,如生成式预训练转换器-3(GPT-3)和 GPT-4,可以与专家相当评估自杀风险,但这些模型使用的标准尚不清楚。本研究探讨了提示词的变化,特别是关于过去自杀企图、性别和年龄的变化,如何影响 ChatGPT-3 和 ChatGPT-4 提供的风险评估。使用基于控制场景的方法,创建了 8 个情景。使用 ChatGPT-3.5 和 ChatGPT 4 来预测严重自杀企图、自杀企图和自杀念头的可能性。进行了单变量 3 因素方差分析,以分析独立变量(过去的自杀企图、性别和年龄)对因变量(严重自杀企图、自杀企图和自杀念头的可能性)的影响。ChatGPT-3.5 和 ChatGPT-4 都认识到过去自杀企图在预测严重自杀风险和自杀念头方面的重要性。ChatGPT-4 还发现了性别差异,将男性与更高的风险联系在一起,而两个模型都忽略了年龄作为一个风险因素。交互分析显示,ChatGPT-3.5 将过去的尝试与男性自杀念头的可能性更高联系在一起,而 ChatGPT-4 则显示女性的风险增加。本研究强调了 ChatGPT-3.5 和 ChatGPT-4 在自杀风险评估中的潜力,强调了过去尝试和性别重要性,同时注意到它们在处理交互效应方面的差异和年龄的微不足道作用。这些发现反映了 GenAI 决策的复杂性。虽然这些模型在自杀风险评估方面很有前景,但由于其局限性和现实世界的复杂性,需要谨慎应用。