Suppr超能文献

绘制人工智能心理健康聊天机器人从基于规则的系统到大型语言模型的演变历程:一项系统综述。

Charting the evolution of artificial intelligence mental health chatbots from rule-based systems to large language models: a systematic review.

作者信息

Hua Yining, Siddals Steve, Ma Zilin, Galatzer-Levy Isaac, Xia Winna, Hau Christine, Na Hongbin, Flathers Matthew, Linardon Jake, Ayubcha Cyrus, Torous John

机构信息

Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA.

Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA, USA.

出版信息

World Psychiatry. 2025 Oct;24(3):383-394. doi: 10.1002/wps.21352.

Abstract

The rapid evolution of artificial intelligence (AI) chatbots in mental health care presents a fragmented landscape with variable clinical evidence and evaluation rigor. This systematic review of 160 studies (2020-2024) classifies chatbot architectures - rule-based, machine learning-based, and large language model (LLM)-based - and proposes a three-tier evaluation framework: foundational bench testing (technical validation), pilot feasibility testing (user engagement), and clinical efficacy testing (symptom reduction). While rule-based systems dominated until 2023, LLM-based chatbots surged to 45% of new studies in 2024. However, only 16% of LLM studies underwent clinical efficacy testing, with most (77%) still in early validation. Overall, only 47% of studies focused on clinical efficacy testing, exposing a critical gap in robust validation of therapeutic benefit. Discrepancies emerged between marketed claims ("AI-powered") and actual AI architectures, with many interventions relying on simple rule-based scripts. LLM-based chatbots are increasingly studied for emotional support and psychoeducation, yet they pose unique ethical concerns, including incorrect responses, privacy risks, and unverified therapeutic effects. Despite their generative capabilities, LLMs remain largely untested in high-stakes mental health contexts. This paper emphasizes the need for standardized evaluation and benchmarking aligned with medical AI certification to ensure safe, transparent and ethical deployment. The proposed framework enables clearer distinctions between technical novelty and clinical efficacy, offering clinicians, researchers and regulators ordered steps to guide future standards and benchmarks. To ensure that AI chatbots enhance mental health care, future research must prioritize rigorous clinical efficacy trials, transparent architecture reporting, and evaluations that reflect real-world impact rather than the well-known potential.

摘要

人工智能(AI)聊天机器人在精神卫生保健领域的快速发展呈现出一幅支离破碎的图景,临床证据和评估严谨性各不相同。这项对160项研究(2020 - 2024年)的系统评价对聊天机器人架构进行了分类——基于规则的、基于机器学习的和基于大语言模型(LLM)的——并提出了一个三层评估框架:基础基准测试(技术验证)、试点可行性测试(用户参与度)和临床疗效测试(症状减轻)。虽然基于规则的系统在2023年之前占据主导地位,但基于LLM的聊天机器人在2024年新研究中激增至45%。然而,只有16%的LLM研究进行了临床疗效测试,大多数(77%)仍处于早期验证阶段。总体而言,只有47%的研究聚焦于临床疗效测试,这暴露了在对治疗益处进行有力验证方面的关键差距。市场宣传声称(“人工智能驱动”)与实际的人工智能架构之间存在差异,许多干预措施依赖于简单的基于规则的脚本。基于LLM的聊天机器人越来越多地被用于情感支持和心理教育,但它们也带来了独特的伦理问题,包括错误回复、隐私风险和未经证实的治疗效果。尽管LLM具有生成能力,但在高风险的精神卫生环境中,它们在很大程度上仍未经过测试。本文强调需要进行与医学人工智能认证相一致的标准化评估和基准测试,以确保安全、透明和符合伦理的部署。所提出的框架能够更清晰地区分技术新颖性和临床疗效,为临床医生、研究人员和监管机构提供有序的步骤,以指导未来的标准和基准测试。为确保人工智能聊天机器人能够改善精神卫生保健,未来的研究必须优先进行严格的临床疗效试验、透明的架构报告以及反映现实世界影响而非已知潜力的评估。

相似文献

本文引用的文献

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验