Department of Psychology and Educational Counseling, The Center for Psychobiological Research, Max Stern Yezreel Valley College, Yezreel Valley, Israel
Department of Brain Sciences, Imperial College London, London, UK.
Fam Med Community Health. 2024 Jan 9;12(Suppl 1):e002583. doi: 10.1136/fmch-2023-002583.
Artificial intelligence (AI) has rapidly permeated various sectors, including healthcare, highlighting its potential to facilitate mental health assessments. This study explores the underexplored domain of AI's role in evaluating prognosis and long-term outcomes in depressive disorders, offering insights into how AI large language models (LLMs) compare with human perspectives.
Using case vignettes, we conducted a comparative analysis involving different LLMs (ChatGPT-3.5, ChatGPT-4, Claude and Bard), mental health professionals (general practitioners, psychiatrists, clinical psychologists and mental health nurses), and the general public that reported previously. We evaluate the LLMs ability to generate prognosis, anticipated outcomes with and without professional intervention, and envisioned long-term positive and negative consequences for individuals with depression.
In most of the examined cases, the four LLMs consistently identified depression as the primary diagnosis and recommended a combined treatment of psychotherapy and antidepressant medication. ChatGPT-3.5 exhibited a significantly pessimistic prognosis distinct from other LLMs, professionals and the public. ChatGPT-4, Claude and Bard aligned closely with mental health professionals and the general public perspectives, all of whom anticipated no improvement or worsening without professional help. Regarding long-term outcomes, ChatGPT 3.5, Claude and Bard consistently projected significantly fewer negative long-term consequences of treatment than ChatGPT-4.
This study underscores the potential of AI to complement the expertise of mental health professionals and promote a collaborative paradigm in mental healthcare. The observation that three of the four LLMs closely mirrored the anticipations of mental health experts in scenarios involving treatment underscores the technology's prospective value in offering professional clinical forecasts. The pessimistic outlook presented by ChatGPT 3.5 is concerning, as it could potentially diminish patients' drive to initiate or continue depression therapy. In summary, although LLMs show potential in enhancing healthcare services, their utilisation requires thorough verification and a seamless integration with human judgement and skills.
人工智能(AI)已迅速渗透到各个领域,包括医疗保健领域,凸显出其在促进心理健康评估方面的潜力。本研究探讨了 AI 在评估抑郁症预后和长期结果方面的作用这一尚未充分探索的领域,深入了解 AI 大语言模型(LLM)与人类视角的比较。
我们使用病例示例进行了比较分析,涉及不同的 LLM(ChatGPT-3.5、ChatGPT-4、Claude 和 Bard)、心理健康专业人员(全科医生、精神科医生、临床心理学家和精神科护士)以及先前报告的普通大众。我们评估了 LLM 生成预后、预测有和没有专业干预的结果以及对抑郁症患者的预期长期积极和消极后果的能力。
在大多数检查案例中,四个 LLM 一致将抑郁症作为主要诊断,并建议联合使用心理治疗和抗抑郁药物治疗。ChatGPT-3.5 的预后明显比其他 LLM、专业人员和大众悲观。ChatGPT-4、Claude 和 Bard 与心理健康专业人员和大众的观点非常一致,所有人都预计如果没有专业帮助,情况不会改善或恶化。关于长期结果,ChatGPT 3.5、Claude 和 Bard 一致预测治疗的负面长期后果明显少于 ChatGPT-4。
本研究强调了 AI 补充心理健康专业人员专业知识和促进精神保健协作模式的潜力。在涉及治疗的情况下,四个 LLM 中有三个与心理健康专家的预期非常吻合,这一观察结果突出了该技术在提供专业临床预测方面的潜在价值。ChatGPT 3.5 呈现的悲观前景令人担忧,因为它可能会降低患者开始或继续接受抑郁症治疗的动力。总之,虽然 LLM 显示出在增强医疗保健服务方面的潜力,但它们的使用需要进行彻底验证,并与人类判断和技能进行无缝整合。