Department of Health Policy and Management, Harvard T.H. Chan School of Public Health, Boston, Massachusetts.
Division of General Internal Medicine and Primary Care, Department of Medicine, Brigham and Women's Hospital, Boston, Massachusetts.
JAMA Netw Open. 2019 Mar 1;2(3):e190096. doi: 10.1001/jamanetworkopen.2019.0096.
The traditional approach of diagnosis by individual physicians has a high rate of misdiagnosis. Pooling multiple physicians' diagnoses (collective intelligence) is a promising approach to reducing misdiagnoses, but its accuracy in clinical cases is unknown to date.
To assess how the diagnostic accuracy of groups of physicians and trainees compares with the diagnostic accuracy of individual physicians.
DESIGN, SETTING, AND PARTICIPANTS: Cross-sectional study using data from the Human Diagnosis Project (Human Dx), a multicountry data set of ranked differential diagnoses by individual physicians, graduate trainees, and medical students (users) solving user-submitted, structured clinical cases. From May 7, 2014, to October 5, 2016, groups of 2 to 9 randomly selected physicians solved individual cases. Data analysis was performed from March 16, 2017, to July 30, 2018.
The primary outcome was diagnostic accuracy, assessed as a correct diagnosis in the top 3 ranked diagnoses for an individual; for groups, the top 3 diagnoses were a collective differential generated using a weighted combination of user diagnoses with a variety of approaches. A version of the McNemar test was used to account for clustering across repeated solvers to compare diagnostic accuracy.
Of the 2069 users solving 1572 cases from the Human Dx data set, 1228 (59.4%) were residents or fellows, 431 (20.8%) were attending physicians, and 410 (19.8%) were medical students. Collective intelligence was associated with increasing diagnostic accuracy, from 62.5% (95% CI, 60.1%-64.9%) for individual physicians up to 85.6% (95% CI, 83.9%-87.4%) for groups of 9 (23.0% difference; 95% CI, 14.9%-31.2%; P < .001). The range of improvement varied by the specifications used for combining groups' diagnoses, but groups consistently outperformed individuals regardless of approach. Absolute improvement in accuracy from individuals to groups of 9 varied by presenting symptom from an increase of 17.3% (95% CI, 6.4%-28.2%; P = .002) for abdominal pain to 29.8% (95% CI, 3.7%-55.8%; P = .02) for fever. Groups from 2 users (77.7% accuracy; 95% CI, 70.1%-84.6%) to 9 users (85.5% accuracy; 95% CI, 75.1%-95.9%) outperformed individual specialists in their subspecialty (66.3% accuracy; 95% CI, 59.1%-73.5%; P < .001 vs groups of 2 and 9).
A collective intelligence approach was associated with higher diagnostic accuracy compared with individuals, including individual specialists whose expertise matched the case diagnosis, across a range of medical cases. Given the few proven strategies to address misdiagnosis, this technique merits further study in clinical settings.
传统的个体医生诊断方法误诊率很高。汇集多位医生的诊断结果(集体智慧)是减少误诊的一种很有前途的方法,但目前尚不清楚其在临床病例中的准确性。
评估医生和受训者群体的诊断准确性与个体医生的诊断准确性相比如何。
设计、设置和参与者:这是一项使用来自多国数据集中的个体医生、研究生和医学生(用户)对排名差异诊断进行排序的数据的横断面研究,这些数据是通过解决用户提交的结构化临床病例得出的。从 2014 年 5 月 7 日至 2016 年 10 月 5 日,随机选择 2 至 9 名医生组成小组解决个别病例。数据分析于 2017 年 3 月 16 日至 2018 年 7 月 30 日进行。
主要结果是诊断准确性,评估为个体排名前 3 的诊断中的正确诊断;对于群体,排名前 3 的诊断是使用用户诊断的各种组合加权组合生成的集体差异诊断。使用麦克内马尔检验的一个版本来考虑重复求解器之间的聚类,以比较诊断准确性。
在解决了来自人类诊断项目数据集的 1572 个病例的 2069 名用户中,1228 名(59.4%)是住院医师或研究员,431 名(20.8%)是主治医生,410 名(19.8%)是医学生。集体智慧与诊断准确性的提高有关,从个体医生的 62.5%(95%CI,60.1%-64.9%)提高到 9 名医生小组的 85.6%(95%CI,83.9%-87.4%)(23.0%的差异;95%CI,14.9%-31.2%;P < .001)。组合小组诊断的规范不同,提高的范围也不同,但无论采用何种方法,小组的表现始终优于个体。从个体到 9 名医生的诊断准确性绝对提高程度因呈现的症状而异,从腹痛增加 17.3%(95%CI,6.4%-28.2%;P = .002)到发热增加 29.8%(95%CI,3.7%-55.8%;P = .02)。从 2 名用户(77.7%的准确率;95%CI,70.1%-84.6%)到 9 名用户(85.5%的准确率;95%CI,75.1%-95.9%)的小组表现优于各自专科领域的专科医生(66.3%的准确率;95%CI,59.1%-73.5%;P < .001,与 2 名和 9 名用户的小组相比)。
与个体医生相比,包括与病例诊断相匹配的专业知识的个体专家在内,集体智慧方法在一系列医疗病例中具有更高的诊断准确性。鉴于目前很少有经过验证的策略来解决误诊问题,这种技术值得在临床环境中进一步研究。