Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA.
Department of Computer Science, Columbia University, New York, NY 10027, USA.
Cell Rep Med. 2023 Oct 17;4(10):101207. doi: 10.1016/j.xcrm.2023.101207. Epub 2023 Sep 27.
Clinical decision support tools can improve diagnostic performance or reduce variability, but they are also subject to post-deployment underperformance. Although using AI in an assistive setting offsets many concerns with autonomous AI in medicine, systems that present all predictions equivalently fail to protect against key AI safety concerns. We design a decision pipeline that supports the diagnostic model with an ecosystem of models, integrating disagreement prediction, clinical significance categorization, and prediction quality modeling to guide prediction presentation. We characterize disagreement using data from a deployed chest X-ray interpretation aid and compare clinician burden in this proposed pipeline to the diagnostic model in isolation. The average disagreement rate is 6.5%, and the expected burden reduction is 4.8%, even if 5% of disagreements on urgent findings receive a second read. We conclude that, in our production setting, we can adequately balance risk mitigation with clinician burden if disagreement false positives are reduced.
临床决策支持工具可以提高诊断性能或减少变异性,但它们也可能在部署后表现不佳。虽然在辅助环境中使用人工智能可以解决医学领域中自主人工智能的许多问题,但平等呈现所有预测的系统无法防止关键的人工智能安全问题。我们设计了一个决策流程,通过一个模型生态系统为诊断模型提供支持,该系统集成了不一致性预测、临床意义分类和预测质量建模,以指导预测呈现。我们使用部署的胸部 X 射线解释辅助工具中的数据来描述不一致性,并将该提议的管道中的临床医生负担与孤立的诊断模型进行比较。不一致率平均为 6.5%,即使对紧急发现的 5%的不一致意见进行第二次读取,预计负担也会减少 4.8%。我们得出的结论是,在我们的生产环境中,如果减少不一致性的假阳性,我们可以在缓解风险和临床医生负担之间进行适当的平衡。