Suppr超能文献

通过使用大型语言模型进行多代理对话来减轻临床决策中的认知偏差:模拟研究。

Mitigating Cognitive Biases in Clinical Decision-Making Through Multi-Agent Conversations Using Large Language Models: Simulation Study.

机构信息

Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore, Singapore.

Department of Anesthesiology, Singapore General Hospital, Singapore, Singapore.

出版信息

J Med Internet Res. 2024 Nov 19;26:e59439. doi: 10.2196/59439.

Abstract

BACKGROUND

Cognitive biases in clinical decision-making significantly contribute to errors in diagnosis and suboptimal patient outcomes. Addressing these biases presents a formidable challenge in the medical field.

OBJECTIVE

This study aimed to explore the role of large language models (LLMs) in mitigating these biases through the use of the multi-agent framework. We simulate the clinical decision-making processes through multi-agent conversation and evaluate its efficacy in improving diagnostic accuracy compared with humans.

METHODS

A total of 16 published and unpublished case reports where cognitive biases have resulted in misdiagnoses were identified from the literature. In the multi-agent framework, we leveraged GPT-4 (OpenAI) to facilitate interactions among different simulated agents to replicate clinical team dynamics. Each agent was assigned a distinct role: (1) making the final diagnosis after considering the discussions, (2) acting as a devil's advocate to correct confirmation and anchoring biases, (3) serving as a field expert in the required medical subspecialty, (4) facilitating discussions to mitigate premature closure bias, and (5) recording and summarizing findings. We tested varying combinations of these agents within the framework to determine which configuration yielded the highest rate of correct final diagnoses. Each scenario was repeated 5 times for consistency. The accuracy of the initial diagnoses and the final differential diagnoses were evaluated, and comparisons with human-generated answers were made using the Fisher exact test.

RESULTS

A total of 240 responses were evaluated (3 different multi-agent frameworks). The initial diagnosis had an accuracy of 0% (0/80). However, following multi-agent discussions, the accuracy for the top 2 differential diagnoses increased to 76% (61/80) for the best-performing multi-agent framework (Framework 4-C). This was significantly higher compared with the accuracy achieved by human evaluators (odds ratio 3.49; P=.002).

CONCLUSIONS

The multi-agent framework demonstrated an ability to re-evaluate and correct misconceptions, even in scenarios with misleading initial investigations. In addition, the LLM-driven, multi-agent conversation framework shows promise in enhancing diagnostic accuracy in diagnostically challenging medical scenarios.

摘要

背景

临床决策中的认知偏差极大地导致了诊断错误和患者治疗效果不佳。在医学领域,解决这些偏差是一个巨大的挑战。

目的

本研究旨在通过使用多代理框架探索大型语言模型(LLM)在减轻这些偏差方面的作用。我们通过多代理对话模拟临床决策过程,并评估其在提高诊断准确性方面与人类相比的效果。

方法

从文献中确定了 16 个已发表和未发表的病例报告,这些报告中的认知偏差导致了误诊。在多代理框架中,我们利用 GPT-4(OpenAI)促进不同模拟代理之间的交互,以复制临床团队动态。每个代理都被赋予一个独特的角色:(1)在考虑讨论后做出最终诊断,(2)作为魔鬼代言人纠正确认和锚定偏差,(3)作为所需医学专科领域的现场专家,(4)促进讨论以减轻过早结束偏见,(5)记录和总结发现。我们在框架内测试了这些代理的不同组合,以确定哪种配置产生了最高的正确最终诊断率。为了保持一致性,每个场景重复了 5 次。评估了初始诊断和最终鉴别诊断的准确性,并使用 Fisher 精确检验与人类生成的答案进行了比较。

结果

评估了总共 240 个响应(3 种不同的多代理框架)。初始诊断的准确性为 0%(80 个中的 0 个)。然而,在多代理讨论后,最佳表现的多代理框架(框架 4-C)中前 2 个鉴别诊断的准确性提高到 76%(61/80)。这显著高于人类评估者的准确性(优势比 3.49;P=.002)。

结论

多代理框架展示了重新评估和纠正误解的能力,即使在初始调查具有误导性的情况下也是如此。此外,基于 LLM 的多代理对话框架在提高诊断挑战性医疗场景中的诊断准确性方面显示出了潜力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/edd7/11615553/463e32ac48bf/jmir_v26i1e59439_fig1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验