Suppr超能文献

评估生成式人工智能对真实世界药物相关问题的回答。

Evaluating generative AI responses to real-world drug-related questions.

机构信息

National Institute on Drug Abuse, Baltimore, MD, USA; University of Pennsylvania, Philadelphia, PA, USA.

National Institute on Drug Abuse, Baltimore, MD, USA.

出版信息

Psychiatry Res. 2024 Sep;339:116058. doi: 10.1016/j.psychres.2024.116058. Epub 2024 Jun 26.

Abstract

Generative Artificial Intelligence (AI) systems such as OpenAI's ChatGPT, capable of an unprecedented ability to generate human-like text and converse in real time, hold potential for large-scale deployment in clinical settings such as substance use treatment. Treatment for substance use disorders (SUDs) is particularly high stakes, requiring evidence-based clinical treatment, mental health expertise, and peer support. Thus, promises of AI systems addressing deficient healthcare resources and structural bias are relevant within this domain, especially in an anonymous setting. This study explores the effectiveness of generative AI in answering real-world substance use and recovery questions. We collect questions from online recovery forums, use ChatGPT and Meta's LLaMA-2 for responses, and have SUD clinicians rate these AI responses. While clinicians rated the AI-generated responses as high quality, we discovered instances of dangerous disinformation, including disregard for suicidal ideation, incorrect emergency helplines, and endorsement of home detox. Moreover, the AI systems produced inconsistent advice depending on question phrasing. These findings indicate a risky mix of seemingly high-quality, accurate responses upon initial inspection that contain inaccurate and potentially deadly medical advice. Consequently, while generative AI shows promise, its real-world application in sensitive healthcare domains necessitates further safeguards and clinical validation.

摘要

生成式人工智能 (AI) 系统,如 OpenAI 的 ChatGPT,具有前所未有的生成类人文本和实时对话的能力,有可能在药物使用治疗等临床环境中大规模部署。药物使用障碍 (SUD) 的治疗特别具有高风险,需要基于证据的临床治疗、心理健康专业知识和同伴支持。因此,AI 系统在解决医疗资源不足和结构性偏见方面的承诺在这一领域是相关的,尤其是在匿名环境中。本研究探讨了生成式 AI 在回答现实世界中的药物使用和康复问题方面的有效性。我们从在线康复论坛中收集问题,使用 ChatGPT 和 Meta 的 LLaMA-2 生成回答,并让 SUD 临床医生对这些 AI 回答进行评分。虽然临床医生认为 AI 生成的回答质量很高,但我们发现了一些危险的虚假信息,包括忽视自杀意念、不正确的紧急求助热线以及对家庭戒毒的认可。此外,这些 AI 系统根据问题的措辞产生了不一致的建议。这些发现表明,乍一看似乎是高质量、准确的回答中包含不准确且可能致命的医疗建议。因此,尽管生成式 AI 具有广阔的前景,但在敏感的医疗保健领域中的实际应用需要进一步的安全保障和临床验证。

相似文献

本文引用的文献

6
10
Large language models encode clinical knowledge.大语言模型编码临床知识。
Nature. 2023 Aug;620(7972):172-180. doi: 10.1038/s41586-023-06291-2. Epub 2023 Jul 12.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验