Barabas Lubomir, Novotny Michal, Jung Dennis, Müller Thomas, Mertse Nicolas Nagysomkuti
Private Clinic for Psychiatry and Psychotherapy Meiringen, Meiringen, Switzerland.
conceito GmbH, Leutzenheldstraße 4, 76327, Pfinztal, Germany.
Nervenarzt. 2025 Jun 6. doi: 10.1007/s00115-025-01837-3.
This exploratory study tested ChatGPT as a digital advisor chatbot for German-speaking individuals in acute psychiatric crises. Additionally, the attitudes of young physicians and psychologists towards the use of large language models (LLMs) in healthcare were investigated.
In total, 20 resident physicians and psychologists simulated patients in three clinical scenarios (depression, psychosis, adjustment disorder) and interacted with ChatGPT. They evaluated the chatbot's performance regarding overall experience, pleasantness, appropriateness of the responses, realism, and helpfulness. Before and after the intervention, their attitudes towards such a chatbot were assessed. Finally, they assessed 12 statements about the future of LLMs in healthcare and provided open feedback on the chat experience.
ChatGPT received predominantly positive ratings (over 8/10 points) for overall experience, helpfulness, pleasantness, and appropriateness, while realism was rated slightly lower at 7/10 points. The appropriateness of the responses varied significantly between the scenarios, with lower ratings for the psychosis scenario. Open feedback confirmed the limited suitability of ChatGPT for psychosis patients. Overall, 70% or more of the participants agreed that LLMs will become increasingly important in everyday life and healthcare, and that an LLM-based chatbot would be a modern tool for low-threshold access to initial psychiatric aid. However, the high number of neutral responses across all 12 items (20-45%) indicates uncertainty regarding the actual benefits and risks.
The performance of ChatGPT was rated positively overall by the participants. Significant practical and methodological limitations remain, however, highlighting the need for further research including real patients for a gradual, carefully monitored integration of LLMs into mental healthcare.
本探索性研究测试了ChatGPT作为德语区处于急性精神危机个体的数字咨询聊天机器人。此外,还调查了年轻医生和心理学家对在医疗保健中使用大语言模型(LLMs)的态度。
共有20名住院医生和心理学家在三种临床场景(抑郁症、精神病、适应障碍)中模拟患者,并与ChatGPT进行交互。他们评估了聊天机器人在总体体验、愉悦度、回答的恰当性、真实感和帮助性方面的表现。在干预前后,评估了他们对这样一个聊天机器人的态度。最后,他们评估了关于大语言模型在医疗保健领域未来的12条陈述,并就聊天体验提供了开放性反馈。
ChatGPT在总体体验、帮助性、愉悦度和恰当性方面获得了主要为正面的评分(超过8/10分),而真实感评分为7/10分,略低。不同场景下回答的恰当性差异显著,精神病场景的评分较低。开放性反馈证实ChatGPT对精神病患者的适用性有限。总体而言,70%或更多的参与者同意大语言模型在日常生活和医疗保健中将变得越来越重要,并且基于大语言模型的聊天机器人将是一种获取初始精神科援助的低门槛现代工具。然而,所有12项中大量的中性回答(20 - 45%)表明对实际益处和风险存在不确定性。
参与者总体上对ChatGPT的表现给予了积极评价。然而,仍然存在重大的实践和方法局限性,这凸显了需要进行进一步研究,包括纳入真实患者,以便将大语言模型逐步、谨慎地整合到心理保健中。