Cabrera Lozoya Daniel, Conway Mike, Sebastiano De Duro Edoardo, D'Alfonso Simon
School of Computing and Information Systems, The University of Melbourne, Gratham St, Parkville VIC, Melbourne, 3010, Australia, 61 90355511.
Department of Psychology and Cognitive Science, University of Trento, Trento, Italy.
JMIR Med Educ. 2025 Jul 31;11:e68056. doi: 10.2196/68056.
In recent years, large language models (LLMs) have shown a remarkable ability to generate human-like text. One potential application of this capability is using LLMs to simulate clients in a mental health context. This research presents the development and evaluation of Client101, a web conversational platform featuring LLM-driven chatbots designed to simulate mental health clients.
We aim to develop and test a web-based conversational psychotherapy training tool designed to closely resemble clients with mental health issues.
We used GPT-4 and prompt engineering techniques to develop chatbots that simulate realistic client conversations. Two chatbots were created based on clinical vignette cases: one representing a person with depression and the other, a person with generalized anxiety disorder. A total of 16 mental health professionals were instructed to conduct single sessions with the chatbots using a cognitive behavioral therapy framework; a total of 15 sessions with the anxiety chatbot and 14 with the depression chatbot were completed. After each session, participants completed a 19-question survey assessing the chatbot's ability to simulate the mental health condition and its potential as a training tool. Additionally, we used the LIWC (Linguistic Inquiry and Word Count) tool to analyze the psycholinguistic features of the chatbot conversations related to anxiety and depression. These features were compared to those in a set of webchat psychotherapy sessions with human clients-42 sessions related to anxiety and 47 related to depression-using an independent samples t test.
Participants' survey responses were predominantly positive regarding the chatbots' realism and portrayal of mental health conditions. For instance, 93% (14/15) considered that the chatbot provided a coherent and convincing narrative typical of someone with an anxiety condition. The statistical analysis of LIWC psycholinguistic features revealed significant differences between chatbot and human therapy transcripts for 3 of 8 anxiety-related features: negations (t56=4.03, P=.001), family (t56=-8.62, P=.001), and negative emotions (t56=-3.91, P=.002). The remaining 5 features-sadness, personal pronouns, present focus, social, and anger-did not show significant differences. For depression-related features, 4 of 9 showed significant differences: negative emotions (t60=-3.84, P=.003), feeling (t60=-6.40, P<.001), health (t60=-4.13, P=.001), and illness (t60=-5.52, P<.001). The other 5 features-sadness, anxiety, mental, first-person pronouns, and discrepancy-did not show statistically significant differences.
This research underscores both the strengths and limitations of using GPT-4-powered chatbots as tools for psychotherapy training. Participant feedback suggests that the chatbots effectively portray mental health conditions and are generally perceived as valuable training aids. However, differences in specific psycholinguistic features suggest targeted areas for enhancement, helping refine Client101's effectiveness as a tool for training mental health professionals.
近年来,大语言模型(LLMs)展现出了生成类人文本的卓越能力。这种能力的一个潜在应用是在心理健康环境中使用大语言模型来模拟来访者。本研究介绍了Client101的开发与评估,这是一个基于网络的对话平台,其特色是由大语言模型驱动的聊天机器人,旨在模拟心理健康来访者。
我们旨在开发并测试一个基于网络的对话式心理治疗训练工具,该工具旨在逼真地模拟有心理健康问题的来访者。
我们使用GPT-4和提示工程技术来开发模拟真实来访者对话的聊天机器人。基于临床病例创建了两个聊天机器人:一个代表抑郁症患者,另一个代表广泛性焦虑症患者。共有16名心理健康专业人员被要求使用认知行为疗法框架与聊天机器人进行单次会话;与焦虑症聊天机器人共完成了15次会话,与抑郁症聊天机器人共完成了14次会话。每次会话后,参与者完成一项包含19个问题的调查,评估聊天机器人模拟心理健康状况的能力及其作为训练工具的潜力。此外,我们使用语言查询与字数统计(LIWC)工具来分析与焦虑和抑郁相关的聊天机器人对话的心理语言学特征。使用独立样本t检验将这些特征与一组与人类来访者进行的网络聊天心理治疗会话中的特征进行比较,其中与焦虑相关的会话有42次,与抑郁相关的会话有47次。
参与者对聊天机器人的真实感和对心理健康状况的描绘的调查反馈大多是积极的。例如,93%(14/15)的人认为聊天机器人提供了符合焦虑症患者典型特征的连贯且有说服力的叙述。对LIWC心理语言学特征的统计分析显示,在8个与焦虑相关的特征中,有3个在聊天机器人和人类治疗记录之间存在显著差异:否定词(t56 = 4.03,P = 0.001)、家庭相关词汇(t56 = -8.62,P = 0.001)和负面情绪(t56 = -3.91,P = 0.002)。其余5个特征——悲伤、人称代词、当前关注、社交和愤怒——没有显示出显著差异。对于与抑郁相关的特征,9个中有4个显示出显著差异:负面情绪(t60 = -3.84,P = 0.003)、情感(t60 = -6.40,P < 0.001)、健康(t60 = -4.13,P = 0.001)和疾病(t60 = -5.52,P < 0.001)。其他5个特征——悲伤、焦虑、心理、第一人称代词和差异——没有显示出统计学上的显著差异。
本研究强调了使用由GPT-4驱动的聊天机器人作为心理治疗训练工具的优势和局限性。参与者的反馈表明,聊天机器人有效地描绘了心理健康状况,并且通常被视为有价值的训练辅助工具。然而,特定心理语言学特征的差异表明了需要改进的目标领域,有助于提高Client101作为训练心理健康专业人员工具的有效性。