Suppr超能文献

人类与人工智能的协作能最准确地诊断临床案例。

Human-AI collectives most accurately diagnose clinical vignettes.

作者信息

Zöller Nikolas, Berger Julian, Lin Irving, Fu Nathan, Komarneni Jayanth, Barabucci Gioele, Laskowski Kyle, Shia Victor, Harack Benjamin, Chu Eugene A, Trianni Vito, Kurvers Ralf H J M, Herzog Stefan M

机构信息

Center for Adaptive Rationality, Max Planck Institute for Human Development, Berlin 14195, Germany.

The Human Diagnosis Project, San Francisco, CA 94110.

出版信息

Proc Natl Acad Sci U S A. 2025 Jun 17;122(24):e2426153122. doi: 10.1073/pnas.2426153122. Epub 2025 Jun 13.

Abstract

AI systems, particularly large language models (LLMs), are increasingly being employed in high-stakes decisions that impact both individuals and society at large, often without adequate safeguards to ensure safety, quality, and equity. Yet LLMs hallucinate, lack common sense, and are biased-shortcomings that may reflect LLMs' inherent limitations and thus may not be remedied by more sophisticated architectures, more data, or more human feedback. Relying solely on LLMs for complex, high-stakes decisions is therefore problematic. Here, we present a hybrid collective intelligence system that mitigates these risks by leveraging the complementary strengths of human experience and the vast information processed by LLMs. We apply our method to open-ended medical diagnostics, combining 40,762 differential diagnoses made by physicians with the diagnoses of five state-of-the art LLMs across 2,133 text-based medical case vignettes. We show that hybrid collectives of physicians and LLMs outperform both single physicians and physician collectives, as well as single LLMs and LLM ensembles. This result holds across a range of medical specialties and professional experience and can be attributed to humans' and LLMs' complementary contributions that lead to different kinds of errors. Our approach highlights the potential for collective human and machine intelligence to improve accuracy in complex, open-ended domains like medical diagnostics.

摘要

人工智能系统,尤其是大语言模型(LLMs),越来越多地被用于影响个人和整个社会的高风险决策中,而往往没有足够的保障措施来确保安全性、质量和公平性。然而,大语言模型会产生幻觉、缺乏常识且存在偏差——这些缺点可能反映了大语言模型的固有局限性,因此可能无法通过更复杂的架构、更多的数据或更多的人工反馈来弥补。因此,仅仅依靠大语言模型做出复杂的高风险决策是有问题的。在此,我们提出一种混合集体智能系统,该系统通过利用人类经验的互补优势和大语言模型处理的大量信息来降低这些风险。我们将我们的方法应用于开放式医学诊断,将医生做出的40762种鉴别诊断与五个最先进的大语言模型对2133个基于文本的医学病例 vignettes 的诊断相结合。我们表明,医生和大语言模型的混合集体在表现上优于单个医生和医生集体,以及单个大语言模型和大语言模型集成。这一结果在一系列医学专业和专业经验中都成立,并且可以归因于人类和大语言模型的互补贡献,这些贡献导致了不同类型的错误。我们的方法突出了人类和机器集体智能在提高医学诊断等复杂、开放式领域准确性方面的潜力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4953/12184336/610bca45eaff/pnas.2426153122fig01.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验