Scholich Till, Barr Maya, Wiltsey Stirman Shannon, Raj Shriti
Institute for Human-Centered AI, Stanford University, Stanford, CA, United States.
PGSP-Stanford PsyD Consortium, Palo Alto University, Palo Alto, CA, United States.
JMIR Ment Health. 2025 May 21;12:e69709. doi: 10.2196/69709.
Consumers are increasingly using large language model-based chatbots to seek mental health advice or intervention due to ease of access and limited availability of mental health professionals. However, their suitability and safety for mental health applications remain underexplored, particularly in comparison to professional therapeutic practices.
This study aimed to evaluate how general-purpose chatbots respond to mental health scenarios and compare their responses to those provided by licensed therapists. Specifically, we sought to identify chatbots' strengths and limitations, as well as the ethical and practical considerations necessary for their use in mental health care.
We conducted a mixed methods study to compare responses from chatbots and licensed therapists to scripted mental health scenarios. We created 2 fictional scenarios and prompted 3 chatbots to create 6 interaction logs. We recruited 17 therapists and conducted study sessions that consisted of 3 activities. First, therapists responded to the 2 scenarios using a Qualtrics form. Second, therapists went through the 6 interaction logs using a think-aloud procedure to highlight their thoughts about the chatbots' responses. Finally, we conducted a semistructured interview to explore subjective opinions on the use of chatbots for supporting mental health. The study sessions were analyzed using thematic analysis. The interaction logs from chatbot and therapist responses were coded using the Multitheoretical List of Therapeutic Interventions codes and then compared to each other.
We identified 7 themes describing the strengths and limitations of the chatbots as compared to therapists. These include elements of good therapy in chatbot responses, conversational style of chatbots, insufficient inquiry and feedback seeking by chatbots, chatbot interventions, client engagement, chatbots' responses to crisis situations, and considerations for chatbot-based therapy. In the use of Multitheoretical List of Therapeutic Interventions codes, we found that therapists evoked more elaboration (Mann-Whitney U=9; P=.001) and used more self-disclosure (U=45.5; P=.37) as compared to the chatbots. The chatbots used affirming (U=28; P=.045) and reassuring (U=23; P=.02) language more often than the therapists. The chatbots also used psychoeducation (U=22.5; P=.02) and suggestions (U=12.5; P=.003) more often than the therapists.
Our study demonstrates the unsuitability of general-purpose chatbots to safely engage in mental health conversations, particularly in crisis situations. While chatbots display elements of good therapy, such as validation and reassurance, overuse of directive advice without sufficient inquiry and use of generic interventions make them unsuitable as therapeutic agents. Careful research and evaluation will be necessary to determine the impact of chatbot interactions and to identify the most appropriate use cases related to mental health.
由于心理健康专业人员数量有限且获取方便,消费者越来越多地使用基于大语言模型的聊天机器人来寻求心理健康建议或干预。然而,它们在心理健康应用中的适用性和安全性仍未得到充分探索,尤其是与专业治疗实践相比。
本研究旨在评估通用聊天机器人如何应对心理健康场景,并将其回复与持牌治疗师提供的回复进行比较。具体而言,我们试图确定聊天机器人的优势和局限性,以及在心理健康护理中使用它们所需的伦理和实际考虑因素。
我们进行了一项混合方法研究,以比较聊天机器人和持牌治疗师对脚本化心理健康场景的回复。我们创建了2个虚构场景,并促使3个聊天机器人生成6个交互日志。我们招募了17名治疗师,并开展了由3项活动组成的研究环节。首先,治疗师使用Qualtrics表单对这2个场景做出回应。其次,治疗师通过出声思考程序浏览这6个交互日志,以突出他们对聊天机器人回复的看法。最后,我们进行了一次半结构化访谈,以探讨关于使用聊天机器人支持心理健康的主观意见。使用主题分析对研究环节进行分析。聊天机器人和治疗师回复的交互日志使用《治疗干预多理论列表》代码进行编码,然后相互比较。
我们确定了7个主题,描述了与治疗师相比聊天机器人的优势和局限性。这些包括聊天机器人回复中的良好治疗要素、聊天机器人的对话风格、聊天机器人询问和寻求反馈不足、聊天机器人干预、客户参与度、聊天机器人对危机情况的回复以及基于聊天机器人的治疗的考虑因素。在使用《治疗干预多理论列表》代码时,我们发现与聊天机器人相比,治疗师引发了更多的阐述(曼-惠特尼U=9;P=.001),并且更多地使用了自我表露(U=45.5;P=.37)。聊天机器人比治疗师更频繁地使用肯定性(U=28;P=.045)和安慰性(U=23;P=.02)语言。聊天机器人也比治疗师更频繁地使用心理教育(U=22.5;P=.02)和建议(U=12.5;P=.003)。
我们的研究表明通用聊天机器人不适合安全地参与心理健康对话,尤其是在危机情况下。虽然聊天机器人展示了良好治疗的要素,如确认和安慰,但在没有充分询问的情况下过度使用指导性建议以及使用通用干预措施使其不适合作为治疗手段。需要进行仔细的研究和评估,以确定聊天机器人交互的影响,并确定与心理健康相关的最合适使用案例。