From the Institute of Diagnostic and Interventional Radiology, University of Cologne, Faculty of Medicine and University Hospital Cologne, Kerpener Str 62, 50937 Cologne, Germany (T.D., X.C., M.P., D.M., D.P.d.S.); School of Business and Economics, Knowledge, Information and Innovation, Vrije Universiteit Amsterdam, Amsterdam, the Netherlands (M.R.M.); Institute of Interventional Radiology, University Clinic Schleswig-Holstein, Kiel, Germany (R.K.); Department of Diagnostic and Interventional Radiology, University Medical Centre of the Johannes Gutenberg-University Mainz, Mainz, Germany (A.M.K.); and Institute of Diagnostic and Interventional Radiology, University Clinic Würzburg, Würzburg, Germany (B.B., S.S.).
Radiology. 2023 May;307(4):e222176. doi: 10.1148/radiol.222176. Epub 2023 May 2.
Background Automation bias (the propensity for humans to favor suggestions from automated decision-making systems) is a known source of error in human-machine interactions, but its implications regarding artificial intelligence (AI)-aided mammography reading are unknown. Purpose To determine how automation bias can affect inexperienced, moderately experienced, and very experienced radiologists when reading mammograms with the aid of an artificial intelligence (AI) system. Materials and Methods In this prospective experiment, 27 radiologists read 50 mammograms and provided their Breast Imaging Reporting and Data System (BI-RADS) assessment assisted by a purported AI system. Mammograms were obtained between January 2017 and December 2019 and were presented in two randomized sets. The first was a training set of 10 mammograms, with the correct BI-RADS category suggested by the AI system. The second was a set of 40 mammograms in which an incorrect BI-RADS category was suggested for 12 mammograms. Reader performance, degree of bias in BI-RADS scoring, perceived accuracy of the AI system, and reader confidence in their own BI-RADS ratings were assessed using analysis of variance (ANOVA) and repeated-measures ANOVA followed by post hoc tests and Kruskal-Wallis tests followed by the Dunn post hoc test. Results The percentage of correctly rated mammograms by inexperienced (mean, 79.7% ± 11.7 [SD] vs 19.8% ± 14.0; < .001; = 0.93), moderately experienced (mean, 81.3% ± 10.1 vs 24.8% ± 11.6; < .001; = 0.96), and very experienced (mean, 82.3% ± 4.2 vs 45.5% ± 9.1; = .003; = 0.97) radiologists was significantly impacted by the correctness of the AI prediction of BI-RADS category. Inexperienced radiologists were significantly more likely to follow the suggestions of the purported AI when it incorrectly suggested a higher BI-RADS category than the actual ground truth compared with both moderately (mean degree of bias, 4.0 ± 1.8 vs 2.4 ± 1.5; = .044; = 0.46) and very (mean degree of bias, 4.0 ± 1.8 vs 1.2 ± 0.8; = .009; = 0.65) experienced readers. Conclusion The results show that inexperienced, moderately experienced, and very experienced radiologists reading mammograms are prone to automation bias when being supported by an AI-based system. This and other effects of human and machine interaction must be considered to ensure safe deployment and accurate diagnostic performance when combining human readers and AI. © RSNA, 2023 See also the editorial by Baltzer in this issue.
背景 自动化偏差(人类倾向于赞成自动化决策系统建议的倾向)是人机交互中已知的误差源,但关于人工智能 (AI) 辅助乳房 X 线摄影阅读的影响尚不清楚。目的 确定当使用人工智能 (AI) 系统阅读乳房 X 光片时,经验不足、经验中等和经验丰富的放射科医生的自动化偏差如何影响他们。材料与方法 在这项前瞻性实验中,27 名放射科医生阅读了 50 张乳房 X 光片,并在一个据称的 AI 系统的辅助下提供了他们的乳房成像报告和数据系统 (BI-RADS) 评估。乳房 X 光片于 2017 年 1 月至 2019 年 12 月之间获得,并以两种随机方式呈现。第一组是 10 张乳房 X 光片的训练集,AI 系统建议了正确的 BI-RADS 类别。第二组是 40 张乳房 X 光片的数据集,其中 12 张乳房 X 光片的 BI-RADS 类别建议错误。使用方差分析 (ANOVA) 和重复测量 ANOVA 以及事后检验、克鲁斯卡尔-沃利斯检验和随后的邓恩事后检验评估读者的表现、BI-RADS 评分的偏差程度、对 AI 系统的感知准确性以及读者对自身 BI-RADS 评分的信心。结果 经验不足(平均 79.7% ± 11.7 [SD] vs 19.8% ± 14.0; <.001; = 0.93)、经验中等(平均 81.3% ± 10.1 vs 24.8% ± 11.6; <.001; = 0.96)和经验丰富(平均 82.3% ± 4.2 vs 45.5% ± 9.1; =.003; = 0.97)的放射科医生正确评分的乳房 X 光片的百分比受到 AI 预测 BI-RADS 类别的正确性的显著影响。与经验中等(平均偏差程度,4.0 ± 1.8 vs 2.4 ± 1.5; =.044; = 0.46)和经验丰富(平均偏差程度,4.0 ± 1.8 vs 1.2 ± 0.8; =.009; = 0.65)的读者相比,经验不足的放射科医生更有可能在 AI 错误地建议更高的 BI-RADS 类别时遵循其建议,而不是实际的地面实况。结论 结果表明,当使用基于 AI 的系统支持时,阅读乳房 X 光片的经验不足、经验中等和经验丰富的放射科医生容易受到自动化偏差的影响。在将人类读者和 AI 结合使用时,必须考虑到这些和其他人机交互的影响,以确保安全部署和准确的诊断性能。 请参阅本期杂志 Baltzer 的社论。