多位医生集体智慧与单个医生诊断准确性的比较。

Comparative Accuracy of Diagnosis by Collective Intelligence of Multiple Physicians vs Individual Physicians.

机构信息

Department of Health Policy and Management, Harvard T.H. Chan School of Public Health, Boston, Massachusetts.

Division of General Internal Medicine and Primary Care, Department of Medicine, Brigham and Women's Hospital, Boston, Massachusetts.

出版信息

JAMA Netw Open. 2019 Mar 1;2(3):e190096. doi: 10.1001/jamanetworkopen.2019.0096.

DOI:10.1001/jamanetworkopen.2019.0096

PMID:30821822

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6484633/

Abstract

IMPORTANCE

The traditional approach of diagnosis by individual physicians has a high rate of misdiagnosis. Pooling multiple physicians' diagnoses (collective intelligence) is a promising approach to reducing misdiagnoses, but its accuracy in clinical cases is unknown to date.

OBJECTIVE

To assess how the diagnostic accuracy of groups of physicians and trainees compares with the diagnostic accuracy of individual physicians.

DESIGN, SETTING, AND PARTICIPANTS: Cross-sectional study using data from the Human Diagnosis Project (Human Dx), a multicountry data set of ranked differential diagnoses by individual physicians, graduate trainees, and medical students (users) solving user-submitted, structured clinical cases. From May 7, 2014, to October 5, 2016, groups of 2 to 9 randomly selected physicians solved individual cases. Data analysis was performed from March 16, 2017, to July 30, 2018.

MAIN OUTCOMES AND MEASURES

The primary outcome was diagnostic accuracy, assessed as a correct diagnosis in the top 3 ranked diagnoses for an individual; for groups, the top 3 diagnoses were a collective differential generated using a weighted combination of user diagnoses with a variety of approaches. A version of the McNemar test was used to account for clustering across repeated solvers to compare diagnostic accuracy.

RESULTS

Of the 2069 users solving 1572 cases from the Human Dx data set, 1228 (59.4%) were residents or fellows, 431 (20.8%) were attending physicians, and 410 (19.8%) were medical students. Collective intelligence was associated with increasing diagnostic accuracy, from 62.5% (95% CI, 60.1%-64.9%) for individual physicians up to 85.6% (95% CI, 83.9%-87.4%) for groups of 9 (23.0% difference; 95% CI, 14.9%-31.2%; P < .001). The range of improvement varied by the specifications used for combining groups' diagnoses, but groups consistently outperformed individuals regardless of approach. Absolute improvement in accuracy from individuals to groups of 9 varied by presenting symptom from an increase of 17.3% (95% CI, 6.4%-28.2%; P = .002) for abdominal pain to 29.8% (95% CI, 3.7%-55.8%; P = .02) for fever. Groups from 2 users (77.7% accuracy; 95% CI, 70.1%-84.6%) to 9 users (85.5% accuracy; 95% CI, 75.1%-95.9%) outperformed individual specialists in their subspecialty (66.3% accuracy; 95% CI, 59.1%-73.5%; P < .001 vs groups of 2 and 9).

CONCLUSIONS AND RELEVANCE

A collective intelligence approach was associated with higher diagnostic accuracy compared with individuals, including individual specialists whose expertise matched the case diagnosis, across a range of medical cases. Given the few proven strategies to address misdiagnosis, this technique merits further study in clinical settings.

摘要

重要性

传统的个体医生诊断方法误诊率很高。汇集多位医生的诊断结果（集体智慧）是减少误诊的一种很有前途的方法，但目前尚不清楚其在临床病例中的准确性。

目的

评估医生和受训者群体的诊断准确性与个体医生的诊断准确性相比如何。

设计、设置和参与者：这是一项使用来自多国数据集中的个体医生、研究生和医学生（用户）对排名差异诊断进行排序的数据的横断面研究，这些数据是通过解决用户提交的结构化临床病例得出的。从 2014 年 5 月 7 日至 2016 年 10 月 5 日，随机选择 2 至 9 名医生组成小组解决个别病例。数据分析于 2017 年 3 月 16 日至 2018 年 7 月 30 日进行。

主要结果和测量

主要结果是诊断准确性，评估为个体排名前 3 的诊断中的正确诊断；对于群体，排名前 3 的诊断是使用用户诊断的各种组合加权组合生成的集体差异诊断。使用麦克内马尔检验的一个版本来考虑重复求解器之间的聚类，以比较诊断准确性。

结果

在解决了来自人类诊断项目数据集的 1572 个病例的 2069 名用户中，1228 名（59.4%）是住院医师或研究员，431 名（20.8%）是主治医生，410 名（19.8%）是医学生。集体智慧与诊断准确性的提高有关，从个体医生的 62.5%（95%CI，60.1%-64.9%）提高到 9 名医生小组的 85.6%（95%CI，83.9%-87.4%）（23.0%的差异；95%CI，14.9%-31.2%；P < .001）。组合小组诊断的规范不同，提高的范围也不同，但无论采用何种方法，小组的表现始终优于个体。从个体到 9 名医生的诊断准确性绝对提高程度因呈现的症状而异，从腹痛增加 17.3%（95%CI，6.4%-28.2%；P = .002）到发热增加 29.8%（95%CI，3.7%-55.8%；P = .02）。从 2 名用户（77.7%的准确率；95%CI，70.1%-84.6%）到 9 名用户（85.5%的准确率；95%CI，75.1%-95.9%）的小组表现优于各自专科领域的专科医生（66.3%的准确率；95%CI，59.1%-73.5%；P < .001，与 2 名和 9 名用户的小组相比）。

结论和相关性

与个体医生相比，包括与病例诊断相匹配的专业知识的个体专家在内，集体智慧方法在一系列医疗病例中具有更高的诊断准确性。鉴于目前很少有经过验证的策略来解决误诊问题，这种技术值得在临床环境中进一步研究。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/15a9/6484633/df8c408b6a24/jamanetwopen-2-e190096-g001.jpg

相似文献

Comparative Accuracy of Diagnosis by Collective Intelligence of Multiple Physicians vs Individual Physicians.

JAMA Netw Open. 2019 Mar 1;2(3):e190096. doi: 10.1001/jamanetworkopen.2019.0096.

Assessment of a Simulated Case-Based Measurement of Physician Diagnostic Performance.

JAMA Netw Open. 2019 Jan 4;2(1):e187006. doi: 10.1001/jamanetworkopen.2018.7006.

The Potential of Collective Intelligence in Emergency Medicine: Pooling Medical Students' Independent Decisions Improves Diagnostic Performance.

Med Decis Making. 2017 Aug;37(6):715-724. doi: 10.1177/0272989X17696998. Epub 2017 Mar 29.

Efficacy of Artificial-Intelligence-Driven Differential-Diagnosis List on the Diagnostic Accuracy of Physicians: An Open-Label Randomized Controlled Study.

Int J Environ Res Public Health. 2021 Feb 21;18(4):2086. doi: 10.3390/ijerph18042086.

Collective Intelligence Increases Diagnostic Accuracy in a General Practice Setting.

Med Decis Making. 2024 May;44(4):451-462. doi: 10.1177/0272989X241241001. Epub 2024 Apr 12.

Effects of Combinational Use of Additional Differential Diagnostic Generators on the Diagnostic Accuracy of the Differential Diagnosis List Developed by an Artificial Intelligence-Driven Automated History-Taking System: Pilot Cross-Sectional Study.

JMIR Form Res. 2023 Aug 2;7:e49034. doi: 10.2196/49034.

Physicians' diagnostic accuracy, confidence, and resource requests: a vignette study.

JAMA Intern Med. 2013 Nov 25;173(21):1952-8. doi: 10.1001/jamainternmed.2013.10081.

Effects of a Differential Diagnosis List of Artificial Intelligence on Differential Diagnoses by Physicians: An Exploratory Analysis of Data from a Randomized Controlled Study.

Int J Environ Res Public Health. 2021 May 23;18(11):5562. doi: 10.3390/ijerph18115562.

ChatGPT-Generated Differential Diagnosis Lists for Complex Case-Derived Clinical Vignettes: Diagnostic Accuracy Evaluation.

JMIR Med Inform. 2023 Oct 9;11:e48808. doi: 10.2196/48808.

Comparison of Diagnostic Recommendations from Individual Physicians versus the Collective Intelligence of Multiple Physicians in Ambulatory Cases Referred for Specialist Consultation.

Med Decis Making. 2022 Apr;42(3):293-302. doi: 10.1177/0272989X211031209. Epub 2021 Aug 11.

引用本文的文献

Human-AI collectives most accurately diagnose clinical vignettes.

Proc Natl Acad Sci U S A. 2025 Jun 17;122(24):e2426153122. doi: 10.1073/pnas.2426153122. Epub 2025 Jun 13.

Diagnostic errors in patients admitted directly from new outpatient visits.

Diagnosis (Berl). 2025 Jan 1;12(2):223-231. doi: 10.1515/dx-2024-0088. eCollection 2025 May 1.

Differences Between Patient and Clinician-Taken Images: Implications for Virtual Care of Skin Conditions.

Mayo Clin Proc Digit Health. 2024 Feb 15;2(1):107-118. doi: 10.1016/j.mcpdig.2024.01.005. eCollection 2024 Mar.

Approach to Acute Dizziness/Vertigo in the Emergency Department: Selected Controversies Regarding Specialty Consultation.

Stroke. 2024 Oct;55(10):2584-2588. doi: 10.1161/STROKEAHA.123.043406. Epub 2024 Sep 13.

Retrieval-Based Diagnostic Decision Support: Mixed Methods Study.

JMIR Med Inform. 2024 Jun 19;12:e50209. doi: 10.2196/50209.

Medical residents' perceptions of group biases in medical decision making: a qualitative study.

BMC Med Educ. 2024 Jun 14;24(1):661. doi: 10.1186/s12909-024-05643-4.

Multimodal assessment improves neuroprognosis performance in clinically unresponsive critical-care patients with brain injury.

Nat Med. 2024 Aug;30(8):2349-2355. doi: 10.1038/s41591-024-03019-1. Epub 2024 May 30.

Boosting wisdom of the crowd for medical image annotation using training performance and task features.

Cogn Res Princ Implic. 2024 May 20;9(1):31. doi: 10.1186/s41235-024-00558-6.

Collective Intelligence Increases Diagnostic Accuracy in a General Practice Setting.

Med Decis Making. 2024 May;44(4):451-462. doi: 10.1177/0272989X241241001. Epub 2024 Apr 12.

Artificial Intelligence: Has Its Time Come for Inclusion in Medical School Education? Maybe…Maybe Not.

MedEdPublish (2016). 2021 Sep 3;10:131. doi: 10.15694/mep.2021.000131.2. eCollection 2021.

本文引用的文献

Extent of diagnostic agreement among medical referrals.

J Eval Clin Pract. 2017 Aug;23(4):870-874. doi: 10.1111/jep.12747. Epub 2017 Apr 4.

The Potential of Collective Intelligence in Emergency Medicine: Pooling Medical Students' Independent Decisions Improves Diagnostic Performance.

Med Decis Making. 2017 Aug;37(6):715-724. doi: 10.1177/0272989X17696998. Epub 2017 Mar 29.

The global burden of diagnostic errors in primary care.

BMJ Qual Saf. 2017 Jun;26(6):484-494. doi: 10.1136/bmjqs-2016-005401. Epub 2016 Aug 16.

Boosting medical diagnostics by pooling independent judgments.

Proc Natl Acad Sci U S A. 2016 Aug 2;113(31):8777-82. doi: 10.1073/pnas.1601827113. Epub 2016 Jul 18.

Detection Accuracy of Collective Intelligence Assessments for Skin Cancer Diagnosis.

JAMA Dermatol. 2015 Dec 1;151(12):1346-1353. doi: 10.1001/jamadermatol.2015.3149.

Collective intelligence meets medical decision-making: the collective outperforms the best radiologist.

PLoS One. 2015 Aug 12;10(8):e0134269. doi: 10.1371/journal.pone.0134269. eCollection 2015.

Diagnostic performance by medical students working individually or in teams.

JAMA. 2015 Jan 20;313(3):303-4. doi: 10.1001/jama.2014.15770.

The frequency of diagnostic errors in outpatient care: estimations from three large observational studies involving US adult populations.

BMJ Qual Saf. 2014 Sep;23(9):727-31. doi: 10.1136/bmjqs-2013-002627. Epub 2014 Apr 17.

Physicians' diagnostic accuracy, confidence, and resource requests: a vignette study.

JAMA Intern Med. 2013 Nov 25;173(21):1952-8. doi: 10.1001/jamainternmed.2013.10081.

The incidence of diagnostic error in medicine.

BMJ Qual Saf. 2013 Oct;22 Suppl 2(Suppl 2):ii21-ii27. doi: 10.1136/bmjqs-2012-001615. Epub 2013 Jun 15.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

多位医生集体智慧与单个医生诊断准确性的比较。

Comparative Accuracy of Diagnosis by Collective Intelligence of Multiple Physicians vs Individual Physicians.

机构信息

Department of Health Policy and Management, Harvard T.H. Chan School of Public Health, Boston, Massachusetts.

Division of General Internal Medicine and Primary Care, Department of Medicine, Brigham and Women's Hospital, Boston, Massachusetts.