Suppr超能文献

评估大型语言模型作为临床中的智能体。

Evaluating large language models as agents in the clinic.

作者信息

Mehandru Nikita, Miao Brenda Y, Almaraz Eduardo Rodriguez, Sushil Madhumita, Butte Atul J, Alaa Ahmed

机构信息

University of California, Berkeley, 2195 Hearst Ave, Warren Hall Suite, 120C, Berkeley, CA, USA.

Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, CA, USA.

出版信息

NPJ Digit Med. 2024 Apr 3;7(1):84. doi: 10.1038/s41746-024-01083-y.

Abstract

Recent developments in large language models (LLMs) have unlocked opportunities for healthcare, from information synthesis to clinical decision support. These LLMs are not just capable of modeling language, but can also act as intelligent “agents” that interact with stakeholders in open-ended conversations and even influence clinical decision-making. Rather than relying on benchmarks that measure a model’s ability to process clinical data or answer standardized test questions, LLM agents can be modeled in high-fidelity simulations of clinical settings and should be assessed for their impact on clinical workflows. These evaluation frameworks, which we refer to as “Artificial Intelligence Structured Clinical Examinations” (“AI-SCE”), can draw from comparable technologies where machines operate with varying degrees of self-governance, such as self-driving cars, in dynamic environments with multiple stakeholders. Developing these robust, real-world clinical evaluations will be crucial towards deploying LLM agents in medical settings.

摘要

大语言模型(LLMs)的最新发展为医疗保健领域带来了诸多机遇,从信息综合到临床决策支持。这些大语言模型不仅能够对语言进行建模,还能充当智能“代理”,在开放式对话中与利益相关者互动,甚至影响临床决策。与依赖衡量模型处理临床数据能力或回答标准化测试问题的基准不同,大语言模型代理可以在临床环境的高保真模拟中进行建模,并应评估其对临床工作流程的影响。我们将这些评估框架称为“人工智能结构化临床考试”(“AI-SCE”),它可以借鉴类似技术,即机器在具有多个利益相关者的动态环境中以不同程度的自主方式运行,如自动驾驶汽车。开发这些强大的、真实世界的临床评估对于在医疗环境中部署大语言模型代理至关重要。

相似文献

7
MedConceptsQA: Open source medical concepts QA benchmark.MedConceptsQA:开源医学概念问答基准。
Comput Biol Med. 2024 Nov;182:109089. doi: 10.1016/j.compbiomed.2024.109089. Epub 2024 Sep 13.
9
Evaluating large language models in theory of mind tasks.评估大型语言模型在心理论任务中的表现。
Proc Natl Acad Sci U S A. 2024 Nov 5;121(45):e2405460121. doi: 10.1073/pnas.2405460121. Epub 2024 Oct 29.

引用本文的文献

8

本文引用的文献

4
Large language models encode clinical knowledge.大语言模型编码临床知识。
Nature. 2023 Aug;620(7972):172-180. doi: 10.1038/s41586-023-06291-2. Epub 2023 Jul 12.
8
Ethical Machine Learning in Healthcare.医疗保健中的伦理机器学习。
Annu Rev Biomed Data Sci. 2021 Jul;4:123-144. doi: 10.1146/annurev-biodatasci-092820-114757. Epub 2021 May 6.
10
Agent-Based Modeling in Public Health: Current Applications and Future Directions.基于主体的建模在公共卫生领域的应用:现状与未来方向。
Annu Rev Public Health. 2018 Apr 1;39:77-94. doi: 10.1146/annurev-publhealth-040617-014317. Epub 2018 Jan 12.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验