Moffett Luke, Barnett Alina Jade, Donnelly Jon, Schwartz Fides Regina, Trivedi Hari, Lo Joseph, Rudin Cynthia
Department of Computer Science, Duke University, Durham, North Carolina, United States of America.
Department of Radiology, Brigham and Women's Hospital, Boston, Massachusetts, United States of America.
PLoS One. 2025 Jun 26;20(6):e0320091. doi: 10.1371/journal.pone.0320091. eCollection 2025.
An external validation of IAIA-BL-a deep-learning based, inherently interpretable breast lesion malignancy prediction model-was performed on two patient populations: 207 women ages 31 to 96, (425 mammograms) from iCAD, and 58 women (104 mammograms) from Emory University. This is the first external validation of an inherently interpretable, deep learning-based lesion classification model. IAIA-BL and black-box baseline models had lower mass margin classification performance on the external datasets than the internal dataset as measured by AUC. These losses correlated with a smaller reduction in malignancy classification performance, though AUC 95% confidence intervals overlapped for all sites. However, interpretability, as measured by model activation on relevant portions of the lesion, was maintained across all populations. Together, these results show that model interpretability can generalize even when performance does not.
对IAIA-BL(一种基于深度学习、具有内在可解释性的乳腺病变恶性预测模型)在两个患者群体上进行了外部验证:来自iCAD的207名年龄在31至96岁的女性(425张乳房X光片),以及来自埃默里大学的58名女性(104张乳房X光片)。这是对基于深度学习的具有内在可解释性的病变分类模型的首次外部验证。通过AUC测量,IAIA-BL和黑箱基线模型在外部数据集上的肿块边缘分类性能低于内部数据集。这些损失与恶性分类性能的较小下降相关,尽管所有部位的AUC 95%置信区间重叠。然而,通过病变相关部分的模型激活来衡量的可解释性在所有群体中都得以保持。总之,这些结果表明,即使性能不能泛化,模型的可解释性也可以泛化。